A New Take on Equal Area Cartograms

The tyranny of large areas and the choropleth map

Where geography consists of districts that vary significantly in size, a statistical map can easily give undue emphasis to the larger ones at the expense of the smaller. Think of Texas and Rhode Island in the US, Bavaria and Bremen in Germany or the Highlands of Scotland and the City of London in the UK, or China in south east Asia.  Simply on account of their size, the larger spatial units are more visible. I call this ‘the tyranny of large areas”.

In some cases, this may not matter – for agricultural or forestry statistics, the focus might be on larger districts. But for socio-economic data, where the more interesting results are commonly concentrated in the smaller districts, it can really matter.

The tyranny of large areas can be exacerbated by the area-density, or choropleth, map, where the whole area of the individual districts is coloured according to the value of the data they contain.

This is true whether count or rate dataare  being displayed.

Displaying count data using the choropleth method is frowned upon, because a large district is likely to contain more data (of certain, but not all sorts) simply because of its size. So total arable yields are likely to be higher in a large district simply as a result of it being larger and containing more arable land. But if you wanted to convey relative fertility, this would distort the story, which would be better represented by a rate: yields per hectare are likely to be far higher in little Kent than in the vast Highlands of Scotland. [1]

A pretty horrible example can be seen here, which purports to show the global distribution of “24,000 My Little Pony: Friendship is Magic fans”. There is just so much wrong here, one would hope that it was deliberate parody. The colour ramp goes from blue (low) to red (high), but this is not sufficiently prominent. It is also a 'categorical' colour ramp, one used not to represent increasing values, but for different categories - for example pigs, sheep, butter. But what has really let the tyrants loose is that the Mercator projection has been used for a statistical map - a big no-no, as Greenland appears 14x its real size. While the data are concentrated in the States, the eye is unavoidably drawn to Canada, Greenland and Russia just because they are large, even though they have next to no data associated with them. The map might have worked (on an appropriate projection) with a colour ramp of a single hue (pink, i suppose!) of increasing intensity, with country values  expressed as a percentage of the 24,000 total.

heatmap-1

My Little Pony: Friendship is Magic fans”

Choropleths are generally reserved for rate data, where the data of interest are shown as a proportion of an appropriate denominator - for example arable yields (tons) per hectare of arable land, or childbirth per 1,000 women of childbearing age.

But the tyrants remain untamed – large areas still dominate and small areas all but disappear. One option is to print your map BIG and examine it in detail. But this is not often practical, or even desirable. Animation can get round the problem altogether by providing pan, zoom and morphing functionalities, but I’m limiting myself here to static mapping.

The cartogram

Another solution is the cartogram, a hybrid between a map and a statistical graphic. A basic ‘area’ cartogram takes the original size and shape of each district polygon and distorts both to reflect not their geographic size, but their relative data values. So for our arable yield (rate) example, the highlands would shrink while Kent would expand. There is a huge range of such cartograms available on the Web – for a taster, have a look at http://www.worldmapper.org/animations/wm01to02.html and also a variety for gridded examples on Danny Dorling’s https://youtu.be/iiDRrh_3U9g. See also http://gisgeography.com/cartogram-maps/.

The equal area cartogram

Another alternative is to represent every discrete sub-unit of your geography with an identically-shaped and -sized symbol. This is the equal area cartogram, an approach which gives the same visual impact to every district. The symbols used vary, and may be contiguous, like squares or hexagons, or separated as in circles.  In the UK, they are most familiar from election mapping. http://sasi.group.shef.ac.uk/maps/elections/local/step_thru_1974-2008.html

There is a limiting factor to all the above maps – they can only show a single result for a single district. This need not be so. It is possible to subdivide each symbol and show shares, or proportions. There are limitations – it’s a well-known limitation of circular pie-charts that it is very difficult to assess the relative shares of the slices; another is that rotating or reordering the pie changes perceptions. Also, hexes may easily be divided six ways, but we don’t usually deal in sixths, and the same problems of percerption remain. Squares, on the other hand are easily divided into 100 smaller squares, permitting the display of percentages. It’s quite easy to calculate values.

Balogh Pál - a népfajok Magyarországon

This is an old technique. The earliest example I have seen is from the huge a népfajok Magyarországon’ (‘The Races in Hungary’) published in Budapest (1896).  In the annexed map case is a cartogram of ‘historical Hungary’ showing each district or major city as an equally-sized square, subdivided into 100 cells. These are coloured to represent the 1890 mother-tongue Census results.

https://axioart.com/images/live_images/200/3295/17_2.jpg

Balogh Pál (overview)[2]

balogh_thb

Balogh Pál (detail)

larger image

Many criticisms can be levelled at Balogh’s map and the data behind it – but this applies equally to every map that has sought to express Hungarian, Romanian, German or probably any other viewpoints over Transylvania (if you want to get into the mire of Transylvanian maps, have a look at my BA Hons and ESRC studentships on http://gisnatural.net/academic.html). Nevertheless, the methodology has merits.

Each district – whether a large rural expanse, or a compact town, is accorded an identical visual impact. The cartogram provides both an overall impression of the distribution of the various mother tongues, and the means of easily totting up the actual local results for each mother tongue group represented, down to ¼ of 1%. There are a few squares that represent the thinly populated Carpathian uplands. Without getting into the politics here, or the advantages to the Hungarian cause of selecting this methodology, these ‘empty’ zones helped Balogh to retain an approximation to the shape of contemporary Hungary in a map made up of squares.

It is perhaps too ambitious. It is very full, and perha-ps a bit overwhelming. The breakdown into ½ and ¼ percentages produces very small symbols that are perhaps too small to be easily readable.  Also, because the distribution of the mother tongues varied across the geography, it was not possible to retain the same sequence of colour across the map. The distribution of the ‘language cells’ within each district square also appears to have been arbitrary. Finally, there is no possibility of assessing the absolute size of the populations – and there were significant differences. But it was an entirely manual process, and for its time a great achievement.

A 21st-century adaptation

I have recently adapted the approach for the modern age and added a twist.

Limiting the cartogram to the display of a single variable simplifies the visual effect, and allows for clear labelling of area name, percentage and absolute value.

I set the area with the highest value as the denominator and calculated the values of all areas as percentages of that denominator. So the area with the highest value is 100% and the entire square is shaded. The remainder are proprtionally less shaded. The advantage this confers is greater differentiation:  if one were to take the total population of all areas as the denominator, the area with the largest value might only comprise, say, 10% of the whole and only 10 of the area's cells would be shaded.  All other areas would individually be less than this, and no more than 10 cells could be shaded for any of them - this would be unlikely to provide sufficient differentiation on the cartogram.

But taking the largest value as the denominator allows one to see the overall pattern of distribution at a glance. The largest concentration stands out clearly. The smaller values are more clearly differentiated from the main value and from each other.

I have not seen this approach anywhere else, and believe that I may be the first to propose or demonstrate it.

I have applied the model to the countries of EFTA, the European Union (EU), EU ‘candidate’ and ‘potential candidate’ countries,  along with Moldova, Ukraine and Belarus – the westernmost members of what the EU refer to as the ‘European Neighbourhood, East’.

The difference between the largest country (Turkey,783,000 sq. km.) and the smallest (Vatican, 0.4 sq.km.) countries is so large that the smallest will be totally invisible on a choropleth, and a proportional or a graduated symbol map of population, or an area  cartogram, would be equally ineffective.  But an equal area cartogram allows every one to be shown.

I have created a template based on these countries that can be used for mapping any data capable of being rendered as a percentage for any combination of them with minimal intervention.

Two examples of the approach:

a) Population count

The country with the largest population is Germany (82.16 million). For comparison, Germany's neighbour the Czech Republic has a population of 10.55 million (13% of the German value). The concentration of populous states in Western Europe, and the great size of the population of Turkey (78.74 million), stand out.

b) GDP per capita

Here, the denominator is provided by Liechtenstein, with an astonishing GDP of $170,000 per capita: Monaco is close behind.  It is instructive how far behind this level are the wealthiest significant countries - Switzerand (48%) and Norway (44%).  Germany has only 25% of this level, just 1% behind the UK. GDP per capita is not a perfect statistic, but it does provide a very good illustration of the versatility of the methodology.

 EuroPopThb EuroGDP   

                  

      British citizens resident in EU / EFTA                                 British citizens resident in Germany            

British citizens abroad EU/EFTA        British citizens in Germany



British citizens in Spain     British citizens in Italy

                       British citizens resident in Spain                                                       British citizens resident in Italy


Challenges

Of course, with a regular geometry, one cannot perfectly reflect all the complexities of real (especially European) borders.  One has to approximate. In the real world, Germany has nine neighbours.  On a chessboard, each interior square has eight, edge squares have five and corners have three.  So it’s impossible to reflect real geography completely accurately with squares: but with hexes, you only have a maximum of six.

You can increase your chances of matching the real world if you extract the very small countries and treat them separately. So, for example, I have set Luxembourg aside. This has allowed me to correctly populate the squares around Germany with the country's other  eight neighbours. Some countries have too many neighbours on the cartogram - Hungary has seven real neighbours and they are all there in the cartogram, but so too is the Czech Republic as a spurious neighbour.  The ‘Balkanisation’ of the former Yugoslavian space has made construction of the cartogram in south east Europe very challenging – note the non-contiguity of Romania and Bulgaria.  But the overall shape is recognisably Europe even if I have resorted to using Malta to suggest the Italian ‘leg’.  And I will probably curve Estonia-Norway round west and south towards Denmark for the next version.

The approach may not always be suitable - where the data does not vary widely across the geography, the benefits are limited.

 Benefits of the approach

·         All countries obtain the same degree of visual impact regardless of their actual size. Even the Vatican can be represented - and seen.

·         Taking the largest value as the denominator allows for a greater variability in symbology.

·         The cartogram is less cluttered than Balogh’s, allowing for clear labelling with percentages and absolute values.

·         It has recently been shown that values on ‘square pie charts’ are easier to read than lines, circular or annular pie charts. So perception studies support the methodology.




[2] BALOGH Pál: A népfajok Magyarországon. [Szövegkötet + Térkép-mellékletei. I-XXII. Tervezte: Balogh Pál. Rajzolta: Proff Kocsárd Sándor. 1902. M. Kir. Vallás- és Közoktatásügyi Minisztérium [Hornyánszky].