The right country codes are ISO alpha-3

If you’re storing countries in a database, you should use codes of some kind. Country names might seem okay, but they:

vary by language (Germany, Allemagne, Deutschland),
have synonyms even within one language (United States, United States of America),
can be ambiguous (Korea, China, Sudan, Congo) or confused with subnational regions (Ireland, Macedonia), and
are not permanent, and change more than you might expect under circumstances violent (Cambodia, Myanmar) or peaceful (Czechia, Eswatini, North Macedonia).

There are many standardized code sets—R’s countrycode package lists around 30—but only two are commonly used: 2 and 3 letter ISO codes.

Of these, you should always use the 3 letter codes (technically ISO 3166-1 alpha-3).

Except of ISO 3166-1 alpha-3 from Wikipedia

Why #

There are only 26 English letters, so the space of 2 letter combinations has only 26 × 26 ≈ 600 codes. With ~250 allocated, it’s crowded, so there are a lot of potential collisions. It’s even more crowded than that, because these are not randomly assinged codes—they’re meant to sound like the country name. So in practice a code like XQ isn’t actually useful (in fact it’s part of a region reserved for user assignment).

Three letters, on the other hand, gives you 26 × 26 × 26 ≈ 17,000 possibilities, so this is a much more sparsely populated space, and therefore has more redundancy and is more robust to errors. That gives two concrete advantages:

3-letter codes are easier for humans to guess at #

You might think that country codes should always be translated into country names for presentation, but in practice that’s not always the case. To take one example, domain names show untranslated not-quite-2-letter-ISO country codes to end users (admittedly, sometimes divorced from the country semantics).

Two letters is not really enough to unambiguously establish a country name. Take these examples:

BD is Bangladesh, not to be confused with Burundi (BI), which itself shouldn’t be confused with BN (Brunei) (vs BDG, BDI, BRN)
CA is Canada not Cameroon (CM), which narrowly avoids Cambodia (KH) for historical reasons (vs CAN, CAM, KHM)
FI is Finland, not Fiji (FJ) (vs FIN, FJI)
Ukraine is UA (what? why is A the second choice here?) not UK, even though the official code of the United Kingdom is GB. The UAE skirts the whole mess by using AE.

Even when codes are unambiguous, two letters are often insufficient to easily bring a country name to mind:

AO is not so obviously Angola as AGO is.
Because of how it’s pronounced (in English, at least), GE does not bring to mind Georgia, whereas GEO at least has a fighting chance.
Two letter codes are usually a substring of the three letter code, but in Ireland’s case that’s not true. And IRL is far more obvious than IE.

Of course even with 3-letter codes, it’s hard to remember that IND is India and IDN is Indonesia, or that Australia is AUS and Austria AUT, or that ZMB is Zambia and not Zimbabwe (ZWE).¹

3-letter codes are harder for machines to misinterpret #

For the same reasons, two letter codes are more likely to collide with other non-country identifiers. There are some well-known examples:

In R, Namibia (NA) might get interpreted as “not available.”
In YAML, Norway (NO) will get interpreted as “false.”

Obviously an appropriate data format will avoid these issues, but sometimes it’s hard to control how data will go out into the world, and three letter codes are just that bit more robust.

My rule of thumb for disambiguating is that the more populous country gets the more obvious prefix-style code—e.g. INDia—while the less populous country gets some other rule, “first letter of each syllable” or prefix-suffix—e.g. InDoNesia. This works for many examples, e.g. IRN vs IRQ. But probaby not all. ↩

Comments (2)

Hey Andrew!

Came upon this article when searching for the preferred way to set country codes.

Your reasoning is strong, and very valid. However, how would you handle a database that has both country codes, and currency codes?

Currency codes are 3-letter, and the first 2 letters come from the ISO alpha-2 country code. In this case, I think it would be better to store country codes in alpha-2, so you can easily associate with the corresponding currency.

What do you think?

Gonçalo Dias 2022-12-24 10:09:05 -0500

And if you want to store languages codes, not country codes, you can use https://iso639-3.sil.org/code_tables/639/data

InfoLibre 2025-03-02 06:15:47 -0500

Add comment

Comments are moderated and will not appear immediately.