You've got the big data and you've got to make it meaningful. Presenting big data is more complex than it looks.
This guest post from Juice Analytics founder and CEO Zach Gemignani, examines the best way to present data. This post was originally published on the Juice Analytics blog under the title â€˜Chart Selection, Art and Science'.
Choosing the right chart for data presentation isn't easy â€” even if you do it for a living. For those with less practise, it may resembles the flash of confusion I experience when my wife asks "Which of these outfits looks best on me?"
And like that answer, there isn't any safety in sitting on the fence.
Wouldn't it be nice if there was a formula for choosing the right chart? The fact that there isn't suggests it is a mix of art and science. There are plenty of examples of people who have taken a crack at this problem:
- Andrew Abela created a diagram that categorizes chart types.
- In Stephen Few's book Show Me the Numbers, Chapter 5 provides an overview of graph fundamentals. Bonus: I received the following Graph Selection Matrix (PDF) from Steve.
- In Stephen Kosslyn's book Graph Design for the Eye and Mind, Chapter 2 is entitled "Choosing a Graph Format"
- Sanket Nadhani shared this short tutorial which tackles the basic choices.
- From NC State, a flow diagram for chart selection
- An Oracle-financed white paper entitled: "Selecting the Best Graph Based on Data, Tasks, and User Roles" (PDF)
- BonaVista Systems has an Excel add-in for choosing the right chart.
(If you know of any others, put them in the comments and I'll add to this list.)
While these are all great resources, I thought it could be instructive to walk through a sample chart selection process, starting simple then gradually adding more complex requirements. The focus of this post is on 'wireframing' the correct presentation techniques; in a follow-up we'll replicate these same charts noting best practices with refined aesthetics and layout.
I typically ask four questions in choosing how to present data:
1. What data is important to show? Specifically, which dimensions and metrics need to be shown at the same time.
2. What do I want to emphasize in the data? For example, do I want to compare different values, show relationships, or present changes over time? What story am I trying to tell?
3. What options do I have for displaying this data? Your Excel chart menu is a start, but don't forget options such as tables, sparklines, small multiples, and advanced visualizations liketreemaps. Many Eyes' list of visualizations can spark additional ideas.
4. Which option is most effective at communicating the data? Which chart or visualization emphasizes what's important in the most direct and readable way?
Imagine a sales organization where two metrics matter most: activity (as measured by call volume) and sales (as measured by dollars sold). The simplest place to start with this data is to present aggregate performance for those two measures. Even with this most basic situation, you have a few options:
Conclusion:Data doesn't always need visualizing. The common and dreadful example of thismistake is when people use a speedometer-style gauge to show a single number (option 3). It is a lot of work, pixels, and distraction for no user value. In this example, we have just a single data point for each measure and no comparisons (e.g. to goals, to last year's performance, the values against each other), so it's best to keep things clean with option 1.
Next, let's look at options for showing activity and sales data by product. In this case, the emphasis should be on the relative performance of each product.
Conclusion: Option 1 is the winner. We prefer a vertical layout of labels (bar chart) to a horizontal (i.e. column chart – not shown) because the labels are more readable and the horizontal layout can suggests a time element in the graph. As has been thoroughly documented, a pie chart doesn't allow you to see differences in values as effectively as a bar chart.
What if we wanted to understand these two metrics by time?
Conclusion:I've backed away from using dual axis charts after experiencing too many situations where people are confused by which line goes with which axis, no matter how clearly labeled. Because the emphasis for the data needs to be the trend over time, I would recommend option 2 over option 3's sparklines.
Now it gets interesting: What if we wanted to understand these two metrics by product and by time?
Conclusion: The best option for this case depends on the importance of clearly communicating the detailed trend for each product. In most cases, the "essence" of the trend is good enough, i.e. Is the trend up? Down? Erratic? Smooth? Under that assumption, option 3 provides a nice comparison of the relative product performance and trend.
A few final observations:
- Labeling matters. How labels are laid out in a chart can be a big difference in readability. It is almost always better if the label text can be written horizontally and be closely tied to the value (rather than in a disconnected legend).
- Multiple areas of emphasis. There will be compromises when you need to emphasize two things simultaneously (trend, relative values). Pick which one matters most.
- Know your options. the more types of charts you know of and understand how to apply, the better set of options you'll be able to come up with.