tl;dr: Should you label the individual dots in a scatterplot? Should you add a trendline? Should you divide the chart area into quadrants? The answers to these—and most other scatterplot design best practice questions—depend on which type of scatterplot you’re designing; a “correlation scatterplot”, or an “item-comparison scatterplot”.
I’m going to assume that, if you’ve decided to read this post, you know what a scatterplot (a.k.a., a “scatter chart” or “scatter graph”) is and how to read one. If that’s not the case, Google is your friend (well, not really).
As with any chart type, there are design best practices that we can follow to make our scatterplots more effective. What, exactly, are those design best practices? Well, that depends almost entirely on what type of scatterplot we’re creating.
Heywhatnow? “Scatterplot” is just a single chart type, like “bar chart” or a “line chart”, right? Well, IMHO, there are two fundamentally different types of scatterplots, and the design best practices for each type are quite different. In fact, I think of them as two entirely separate chart types, even though that isn’t a common perception.
What are the two types of scatterplots? Well, I’m glad I assumed that you asked…
Type 1: “Correlation” scatterplots
Consider this scatterplot:
If you know how to read a scatterplot, you know that the most obvious insight here is that clients who’ve been with us longer tend to have lower satisfaction ratings, suggesting that we may need to pay more attention to clients who’ve been with us for a long time. More generally, the purpose of this scatterplot is to show how one variable (e.g., Satisfaction Rating) is (or isn’t) related to another variable (e.g., Years as a Client). When the main purpose of a scatterplot is to show the relationship between two quantitative variables like this, I call it a “correlation scatterplot”.
In correlation scatterplots, the important thing is the pattern that the dots form, since different patterns indicate different types of relationships that can exist between two variables, for example:
To better visualize these dot patterns, it can be useful to add features such as trend lines, confidence intervals, etc. to correlation scatterplots, such as the trendline in the chart below:
Now, let’s look at the second type of scatterplot. As we’ll see, everything that I just said about correlation scatterplots goes out the window when we’re talking about…
Type 2: “Item-comparison” scatterplots
Consider this scatterplot:
The main purpose of this scatterplot is to show how a set of items (e.g., sales prospects) compare with one another. For example, we can see that InterCorp (in the upper left) has a relatively high revenue potential and short sales cycle and so we should probably focus more of our sales efforts on them, and stop spending time on Eastern Insurance (in the lower right).
The point here isn’t to show the general relationship between the “Estimated Revenue Potential” variable and the “Estimated Sales Cycle” variable. Indeed, the general relationship between the two variables is largely irrelevant in a chart like this. Sure, we can see that relationship as a pattern of dots, but that’s not what this chart is for.
If the main purpose of a scatterplot is to show how a set of individual items compare with one another (and not how two variables relate to each other in general), I call it an “item-comparison scatterplot”. This fundamental difference in purpose impacts most of the best practices that we’d use to design each type of scatterplot, for example…
Dot labels
Because item-comparison scatterplots are all about comparing individual items with one another, it’s virtually always necessary to label the dots individually so we know what each dot is (e.g., which sales prospect each dot represents). In fact, item-comparison scatterplots are basically useless if the dots aren’t labeled. In correlation scatterplots, however, it’s rarely useful or necessary to label individual dots.
Quadrants
Item-comparison scatterplots almost always benefit from being divided into regions (e.g., quadrants) with helpful labels for each region. The “Sales Prospects” scatterplot above, for example, is divided into useful regions labelled “Actively Pursue”, “Nurture”, etc. Indeed, item-comparison scatterplots are sometimes called “two-by-two matrices” because they’re so often divided into a two-region by two-region matrix. In correlation scatterplots, however, dividing the chart area into regions is rarely useful or necessary:
Trend lines, confidence intervals, etc.
In an item-comparison scatterplot, the pattern formed by the dots is usually of only incidental interest or irrelevant. Therefore, adding features to make dot patterns clearer, such as trend lines, confidence intervals, etc. is rarely necessary or useful in item-comparison scatterplots, and may not even make sense:
Yes, there are some design best practices that apply to both types of scatterplots, but most only apply to one type or the other, for example:
Now what?
While both types of scatterplots are based on the same basic idea of plotting dots in a 2-D space based on two quantitative measures (thanks, René!), the similarities pretty much end there. Both types of scatterplots can be very useful, though, and I see both “in the wild” all the time. This means that it’s important for data viz practitioners to know which design best practices apply to each type so that they don’t apply the right best practices to the wrong chart type. The first step, though, is to recognize that there aren’t “scatterplot design best practices”, only “correlation scatterplot design best practices” and “item-comparison scatterplot design best practices”.
Unfortunately, however, many articles and books that discuss scatterplot design best practices only discuss one of the two types of scatterplots without even mentioning the other, and this can be confusing (or worse) if you happen to be designing a scatterplot of the other type. Hopefully, this article will help clear up any such confusion, but please let me know in the comments if any of the above doesn’t jibe with you.
By the way…
If you’re interested in attending my Practical Charts or Practical Dashboards course, here’s a list of my upcoming open-registration workshops.