Making of: comparative treemaps
or “the incredible shrinking treemap” HT @albertocairo).
Treemaps are a really useful way to understand hierarchical data, but they are not well-suited to side by side comparison.
Recently I’ve been working on the World Bank’s Atlas of the Sustainable Development Goals 2017 (which was a large team effort). One highlight that I worked on was a comparative treemap / cartogram of people living in extreme poverty, which is a bit different from the typical treemap.
Here’s one version, an animated GIF that Tariq Khokhar captured (you can see the interactive timeline here):
How has the number & distribution of people living in #extremepoverty changed? Find out here: https://t.co/ZhzRPKmsKf #SDGAtlas #WDI2017 pic.twitter.com/d0rXz7qSWZ— World Bank (@WorldBank) April 21, 2017
This turned out to be a popular data visualization: an interesting topic, very colorful, and animated—perfect for retweeting:
@BillGates Progress can be hard to see. This #dataviz makes it easy…
@JimYongKim Incredible drop in #poverty since 1990. Let’s commit to reach our goal to #endpoverty by 2030.
@Noahpinion Wow, best infographic I’ve seen in a while.
From a dataviz perspective, the interesting thing is how we modified the standard treemap to make it suitable for side-by-side comparison.
Treemaps are in a class of geometries - including network layouts, stacked bar charts, pie charts - where the position of marks is usually arbitrary: size and adjacency may contain useful information, but x-y position itself is chosen mostly for aesthetic reasons. Moreover, the position of a mark is dependent on other marks, which means that when the data change, the marks rearrange themselves in a way that is irrelevant and distracting.
You can see this at work in the early draft of our poverty treemap, produced by Hiroki Uematsu, a poverty specialist we worked with:
The concept is there, but all the irrelevant movement makes it hard to parse. You can feel your eyes scanning up and down, left and right, in an attempt to connect the colored regions. This distracts from the immediate, compelling impact of the changes in size.
(The same thing happens with a stacked bar or area chart, which is one reason many people warn against using them.)
In an interactive setting you can often get away with this, because animated transitions help the viewer understand (and, perhaps, mentally discard) the position changes. But in a print/static output, you don’t have that option.
So we adapted the treemap algorithm in the following way:
Find the maximum value that each country has, over all the years to be displayed.
Use the treemap algorithm to lay out the “maximum boxes”.
For each year, resize the countries as needed about their maximum box centers.
Finally, underneath each year, we painted the maximum boxes partially transparent. I’m still not sure if this purely aesthetic touch was the right call. It has the effect of tying everything together into a contiguous rectangle, as people expect a treemap to be, but it does seem to add some confusion in interpretation.
Another thing that is lost in this adaptation is the change in the total - that’s actually easier to see in the default arrangement, which is no small disadvantage. Nevertheless, one objective of the atlas was to really focus on country-level data, so it was a worthwhile trade-off.
All this is best demonstrated in the original static, side by side comparison that we included in the printed atlas (a double-page spread on pages 2 and 3), which I actually prefer to the interactive and animated versions:
Compared with the animated version, I think the comparative treemap actually works slightly better with only two years are being compared. The lightly shaded area can be directly interpreted as the other year. In the multi-year version, the shaded area doesn’t have this easy interpretation.
For the print atlas, we worked with professional publishers who were able to manually tweak things to perfection - labels, legends, captions, etc. It’s a lot of work to do that in something like D3. See how the label for Vietnam is slightly offset in 2013 so it doesn’t fall outside the darker box. That’s an attention to detail that automatic labelling can rarely achieve.
Finally, manual tweaking meant that we could rearrange the treemap slightly. See how Latin America & Caribbean (green), Europe & Central Asia (red) and Middle East & North Africa (purple) are in the center at the left? That’s not the result you get from a standard treemap layout, which would push these small areas to the bottom right. But a bit of manual rearrangement, we were able to bring some sense of geographic topology to the chart, with South Asia “east” of Africa and Africa “south” of Europe. It’s a small, virtually unnoticeable but (to me) very pleasing hat-tip in the direction of a cartogram, without too much damage to the aesthetic conventions of treemaps.
Comparing with the regional key map we used, you can see the improvement. Does it help you read the treemap? Probably not really, but I still find it very satisfying.
Comments are moderated and will not appear immediately.
One update - you can now try building these yourself using my R package: https://github.com/econandrew/ggtreemap/tree/master
It’s rough around the edges, but should work.