COVID-19 In Charts: Examples of Good & Bad Data Visualization

Picture of Earth

Header Photo by NASA on Unsplash

With the world on lockdown as a result of Covid-19, one thing I've noticed over the last few months has been a steady rise in interest for charts and data visualization. From the mainstream media to my social feeds, to even Zoom calls with friends and family, there is rarely a time when a reference to the latest Covid-19 data or chart doesn't come up.

Consider for a moment one of the most popular sources of Covid-19 case data, Worldometer. In the months before the outbreak, their website averaged around 3.5 million visits per month. In March, the website almost broke through a billion visits! This steep rise in traffic volume has pushed their global site ranking from around 9,000 a few months back to a rank of 80 today. Seriously, somebody's going to the bank with all that incremental ad revenue.

Source: Similiar Web, Monthly Site Visits for worldometers.info

Source: Similiar Web, Monthly Site Visits for worldometers.info

More charts, but not necessarily more good charts

I think it's great that more and more people are taking an interest in data visualization. It's also great to see a larger share of mainstream media using DataViz more frequently to help tell their stories and inform the public. The New York Times has historically been a leader in this, but I've noticed a lot more news media waking up to the power of data-driven storytelling over the last year. The BBC, for example, did some solid reporting during the Australia bush fires earlier this year, using charts to help their readers understand the scale of what was happening. They even warned readers about some potentially misleading graphs.

That said, building an effective DataViz, like any skill, requires a solid conceptual foundation. Most information designers spend years learning and perfecting their craft (if you're looking for a good place to start, then follow Edward Tufte and Cole Knaflic). With the rise in more people engaging with data as a result of Covid-19, another trend I've observed has been many people, businesses and news media creating and sharing poorly constructed and sometimes misleading charts.

“The purpose of visualization is insight, not pictures”

The quote above, from American computer scientist Ben Shneiderman, is especially important in this context. When we visualize data, we usually do it to increase the speed at which one can process data and reach a conclusion. In other words, to increase speed to insight.

Let's put this into perspective.

From raw to refined

Imagine raw data for a moment. What does it look like? The first thing you probably think of is a table, but that's not really raw data. Web analytics tools like Google Analytics, for example, typically capture data in log files through browser pings, and in its rawest form it looks something like the picture below (but most people will never see the data in this format).

Data Log

Source: https://blog.dataiku.com/build-your-own-google-analytics

You won't get much insight from this. For most of us to effectively read data, we need a visual structure where we isolate certain metrics and dimensions. And this is where tables come in.

Table Data

Table Data

Tables are much easier to read than log files, but they're still not necessarily the best way to read and interpret data efficiently. And this is precisely why we use charts.

Google Analytics Charts

Charts offer visual cues that help us process information faster and more efficiently. After all, humans are visual creatures!

But not all charts are good charts. I've covered this topic quite a bit on my blog in the past, with examples here and here. Sometimes people create bad charts to deceive their audience in order to push their agenda or worldview. But most of the time, terrible charts are a result of the creator simply not knowing any better. There's both an art and science to being a compelling data storyteller, and perfecting your craft starts with building a solid foundation in the principles of information design.

So today, I wanted to share some specific examples of charts I've come across recently that depict Covid-19 stats, both good and bad.

Let’s start with the bad

Here's a chart created by and shared on Channel News Asia back in March. It plots 2 data series, daily new cases and daily patients discharged, over three months.

Source: Channel News Asia

So what's wrong with it?

A few things. First, there's too much "chart junk", or what Edward Tufte calls "non-data ink". The key principle here is to limit the amount of noise and unnecessary formatting so the reader can focus on what the data has to say. In this case, including elements like the chart grid and data labels on each bar add unnecessary clutter.

Another issue is the use of a bar chart based on the profile of the data. When you have time-series data with very granular time intervals (e.g. daily vs monthly) this is usually (but not always) a strong signal to use a line chart. In the example above, we have almost three months of data with a daily time interval and with two different data series (i.e. new cases vs discharged). As a result, the bars on the chart are thinned out with very little white space between them, making it hard to read.

To fix this your first inclination might be to convert this to a line chart, but this doesn't really make the chart more readable, as you can see below.

Covid-19 Singapore New Cases vs Discharged: Line Chart

Generally speaking, when you have time series data with granular time intervals, a line chart is the way to go. But the data in this chart is problematic for a standard line graph for a few reasons. First, we're dealing with small data, with a y-axis scale range of just 13 points. Second, the data is volatile. And when you add this to a daily level time series, you simply need to start looking for more creative ways to present your data.

There are a few ways we can go about fixing this. One approach would be to break the data across two charts, maintain the core data series as a bar chart but then add a trend line using a moving average. And if we apply some smart formatting, the data will speak a little more clearly. Here's an example.

Covid-19 Singapore New Cases vs Discharged: Split Bar Chart with Moving Average

In this chart, I've made the bars more transparent, so they're no longer the core focus though the underlying data is still present. Then, I plotted a trend line using a 5-day moving average. Moving averages, similar to logarithmic scales, are very useful for smoothing out data to help make trends easier to see. But when it comes to moving averages, you'll have to choose the right interval based on the time frame of your data. Since I have roughly three months of data, I've configured my trend line to show a 5-day moving average. If I had a longer time frame, say one year, you might want to consider something wider, like a 14 or 28 day moving average.

The trend line approach shown here is a technique I see used by the New York Times quite often (example here), and I think it's a very effective way to de-clutter a chart.

Another way to present this data more effectively would be to use a diverging bar chart, where you mirror the scale beneath the x-axis with the second data series. This approach will definitely involve a little more thoughtfulness and formatting to reach the desired outcome. But in the end, I think it's worth it.

Covid-19 Singapore New Cases vs Discharged: Diverging Bar Chart

The key changes here that I think make the chart easier to read is the removal of the data labels on every bar, and the widening of the bars that make them easier to distinguish. I've also added some derived metrics, average daily new cases and average daily patients discharged, on the right side to offer some extra insight as a substitute for the data labels on each bar.

Ok, on to the next example.

The CNA chart above is pretty forgivable, as it boils down to just some poor design choices. But this next chart is something special. Enter, Time Magazine.

Source: Time Magazine

Source: Time Magazine

This chart appeared in an article Time published back in March. Hindsight is 20/20, but it's safe to say that the author's central thesis hasn't aged well.

“Even when taking the current estimated global mortality rate of 3.4% at face value, COVID-19 looks more like influenza than other once-novel coronaviruses.”

So, what exactly is wrong with the chart above? Pie charts show composition, and although a mortality rate does represent a subset of a whole value, in context, these charts are downright misleading. The first thing we need to consider is that the Covid-19 pandemic is still unfolding. So, it's not fair to compare mortality rates for Covid-19 to previous outbreaks that have run their course. The seasonal flu data, for example, is an annual average compiled over many years of data collection, and it's only for the U.S. So this isn’t an apples-to-apples comparison.

There's also some essential context missing from these charts. The rate of infection, otherwise known as the R nought or R0 for short, can have an enormous impact on the volume of mortalities. The science of Covid-19 is still unfolding, but the latest estimates put Covid-19's R0 close to 2.5, almost double that of the seasonal flu.

All of this adds up to several major flaws with this chart. First, a pie chart isn't the best way to show this data. Second, only offering mortality rates on this graph neglects essential context the reader might need. And three, as a matter of principle, it's not all that productive to be comparing Covid-19's mortality rate at this stage of the pandemic. So the lesson here is context; it's everything.

Ok, one more lousy chart before we look at some good ones.

Source: Business Insider

Source: Business Insider

This one's from Business Insider, and was published in March. Formatting and presentation-wise, this one's not bad. They've followed Tufte's data-ink ratio rule, and aesthetically, the chart is well put together. They've even made some sound design choices, like ensuring both charts use the same y-axis range as well as extending the horizontal grid lines across both plots.

The problem is with how the author has compared the data. First, the data on the left side is based on seasonal flu trends in the U.S. only, whereas the data on the right side is based on South Korea's Covid-19 case data. These are not directly related datasets, so you have to be careful when comparing data from two different times, countries and sources (i.e. seasonal flu vs Covid-19).

But the bigger issue here lies with the x-axis, as the two charts use completely different age breaks, making it difficult and confusing for the reader to draw comparisons. This is likely a result of how the original datasets were structured. But the creative team at Business Insider could have fixed this, and they should have.

Again, I don't think comparing historical seasonal flu data in the U.S. to current Covid-19 case data in South Korea is all that useful to start with. But if you had to do it, there are ways that you can try to make the axis more comparable. Both age breaks have something in common, which is a break at 50 years old. So you could calculate an average for the below-50 and above-50 years old across both influenzas. You would have to factor in the actual case data, though, and calculate a weighted average. But it's doable and would make the data much easier to compare.

All right, now for the good

So we covered a few examples of bad DataViz about Covid-19; now for a few examples of the good!

The next graphic is one of my favourites. It was created by Vox and shows the evolution of testing across different countries affected by Covid-19. This data is now very much out of date, as testing in the U.S. has ramped up considerably. But at the time of its release, I thought this was one of the most insightful and well-constructed charts on Covid-19 to date. It really put things into perspective for me in terms of testing capacity relative to population size.

Source: Vox

Source: Vox

Next up, below is a chart from the New York Times. This one is not about Covid-19 case data, rather the effect that the outbreak is having on the U.S. economy. It looks simple, but it's masterfully constructed. Although it appears to be a mix between an area and line chart, what we're looking at is a bar chart over an extended period with a very granular time interval (probably weekly). There's some excellent formatting applied here to help you know where to look, such as the grey highlighting to highlight the 2008 recession, the use of scale, and the fact that the chart wraps around the article header and lead paragraph. Awesome stuff.

Chart from New York Times about Covid-19 effect

And finally, here's a great chart that was published by the Visual Capitalist. The graphic is fully interactive, so I recommend heading over to check it out.

Source: Visual Capitalist

Source: Visual Capitalist

The author who created this graphic wasn't the first to apply this technique, but there is some brilliant thinking here in terms of how to normalize and show Covid-19 case data for different countries. The core dataset is total cumulative Covid-19 cases by country, and it uses a log scale on the y-axis to help show the trends in the data (vs the more traditional linear scale).

Using a log scale is very useful here as it makes the data easier to read. But the smartest thing about this chart can be found on the x-axis. The graphic doesn't actually plot all case data. Rather, it shows cases since the 100th confirmed case in each country. Why? Because the rate at which the infection spread varied considerably across countries. The U.S. and South Korea, for example, had their first confirmed cases around the same time, but the rise of cases in the U.S. came much later than South Korea. So the x-axis is no longer fixed in time, as it now shows the number of days passed since the 100th case. By cancelling out the period for which it took each country to reach 100 cases and by applying a log scale, the result is probably the best static chart you can find for comparing the rise in Covid-19 cases across countries.

There are obvious limitations with the data, such as accounting for testing capacity. But all things considered, this is a thoughtful and well-constructed graphic.

So that's all for today! I hope you enjoyed this post, and if you have more examples of good and bad charts about Covid-19, I'd love to see them in the comments below.

Stephen Tracy

I'm a designer of things made with data, exploring the intersection of analytics and storytelling.

https://www.analythical.com
Previous
Previous

VIDEO: Talking Start-ups and Data Analytics on the Pitchdeck Asia Podcast

Next
Next

This Is How Data Inspires Creativity