Not so basic statistics: life expectancy

I believe strongly in simple models and straightforward measures. But even the simplest measures can hide complexity and strong assumptions. This was brought home to me by two stories that have been in the news in the last fortnight, relating to two of the most common numbers you will come across in social science research and reporting: the unemployment rate and life expectancy. Both news stories raised questions about how these numbers were interpreted.

This is the first of what may become a series of posts, and focuses on life expectancy.

Trending #8 on BBC News recently was “How in a single year did life expectancy in the US drop by 12 years?”, a clickbait title for a story about the Spanish flu of 1918, which swept across the world in the wake of the First World War.

The 12 year drop caught my attention, so I dug around, and – confusingly – this statistic is both true and highly misleading. Like the “interest rate”, there is no single thing called “life expectancy”. Media reports rarely explain which version of life expectancy they mean – in this case the BBC links to a US National Archives webpage which is no more specific, and doesn’t cite a source. That’s frustrating, because there are important assumptions embedded in the ‘standard’ quoted life expectancy measure, which, spelt out, is ‘period life expectancy at birth’.

The actual source for the 12 years figure seems to be this work by demographers at Berkeley. You can see from Figure 2 that, sure enough, life expectancy drops from 48, in 1917, to 36, in 1918 (you can check that from their data, here). So the BBC article is right. But it’s nonetheless misleading.

Life expectancy is usually calculated ‘at birth’

Life expectancy is usually calculated ‘at birth’. To realise why this matters, consider the latest (2012) “life expectancy in the UK”, as Google reports it: 81.5. This doesn’t even mean that an “average” person plucked from the UK will live to 81.5. When we talk about life expectancy, we usually mean the expected number of years of life remaining for somebody aged zero in the reference year – demographers denote this e(0). An older person will generally have fewer expected years of life remaining, although depending on infant mortality, expectation of life may actually go up in the first year or two, because infancy is a particularly dangerous time.

(This is exemplified in a great short piece from the BBC News Head of Stats, Anthony Reuben, pointing out that in 1964, and most years prior to that, the modal – ie. most common – age of death in the UK was zero. That’s a tragic number. It’s like a statistician’s equivalent of the ‘six-word novel’ attributed to Hemingway: For sale: Baby shoes, never worn. Most common age at death: zero.)

Age (x) Male life expectancy at age x, e(x)
0 75.40
1 74.84
2 73.87
100 1.89


We can see this happening in a typical life table, for the US in 2010 (Table 6, pp. 59-61 of this US Social Security Administration publication), part of which I’ve reproduced above. As you can see, the life expectancy at birth, the first number, is 75.40. It’s a reasonable reflection of public health, as a country or civilization, but it’s directly relevant to practically nobody – nobody old enough to be reading this blog, anyway. Now follow down the column. Life expectancy at age 1 is 74.84, which means that the average 1 year old in 2010 had 74.84 years of life remaining (not quite, but more on this later). Add this to the years they’ve already lived, and you reach an expected lifespan of 75.84 years. So 1-year-olds, on average, will live slightly longer than newborns. If this seems counterintuitive, remember that these one year olds have already banked a year of life, successfully avoiding the many fatal risks they faced in their first year, whereas the newborns are yet to navigate this minefield. Go to the very end of the table, and you can see that centenarians have another 1.89 years remaining, for an average lifespan of 101.89 years.

So, the first crucial thing to realise about life expectancy is this ‘at birth’ calculation. This can have important effects on interpretation. A classic example is the common belief that life in the prehistoric world was, to paraphrase Hobbes, short – and that prehistoric societies were therefore very young, with the stages of life dramatically accelerated, and very few elderly people. At first glance it seems that historical analysis of demographics backs this up, with estimates of prehistoric life expectancy as low as 25. If life expectancy is 25, goes the thinking, then a 30 year old would be a white-haired elder.

But this is fallacious reasoning: very high infant mortality in prehistoric times has an outsize effect on the mean. Infant mortality in Roman times might have been as high as 300 per 1000, around 30 times higher than in developed societies today. But simple maths suggests that with 30% of people dying age 0, the other 70% must have lasted a lot longer (35 on average) to result in a life expectancy at birth of 25. In fact, if you made it to puberty in the prehistoric world your life expectancy was probably 40+, and people of 50, 60 or 70, while uncommon, would not have been unheard of.

Life expectancy usually reflects an artificial, stationary world with no medical progress

The second crucial thing to understand about life expectancy is perhaps even more important. It involves the difference between ‘cohort’ life tables and ‘period’ life tables. Cohort tables use probabilities that reflect the actual conditions that would apply to a person at each year of their actual life. So, for example, the cohort life expectancy of a child born in the US in 1918 would take account of the discovery of penicillin in 1928 (age 10) and the development of a polio vaccination in 1952 (age 34) – both events that had a major impact on his or her health and life expectancy. Period tables do not account for this. They are a completely artificial construction, applying the risks present in the base year to every year of the theoretical person’s life. So the period life expectancy at birth in 1918 acts as if medical science and public health remained forever frozen at 1918 levels.

This, then, explains why you see such a dramatic single year change in life expectancy in 1918 – it’s the period figure. It was undoubtedly an extremely risky moment to be alive, but the Spanish flu did, inevitably, pass, and if you survived it, your lifetime risks remained basically unchanged. So life expectancy returned to trend in the couple of years during which the epidemic played itself out.

Above I’m comparing, side by side, data from Tables 10 and 11 of the SSA report: life expectancy at birth, on a period and cohort basis respectively, for males and females (I’ve copied this data into a Google doc). Whereas the period data shows the striking decline (7 years here, not 12 years – I haven’t looked into this discrepancy) mentioned in the BBC headline, the cohort data – which, remember, reflect the actual lived reality of people born in those years – show only a small kink. It is still impressive, and scary, that a microscopic strand of RNA can have this effect on human populations – but it’s nothing like as stark as the ’12 year’ drop reported.

This leaves only the question of why, if they’re so unrealistic, demographers use period tables. I’m not a demographer, but I assume the reason is that they are straightforward and objective – they can be calculated directly on the basis of death records for that year. Cohort tables, on the other hand, require mortality data for the entire period of a cohort’s life. This may be hard enough for historical work (try assembling any data set over 100 years), but to produce such tables for any cohort which still has living members is harder still, requiring forecasts of future mortality. And since forecasts are usually wrong, such tables will inevitably be less than reliable. Consider that a prospective cohort life expectancy in 1950 might reasonably have assumed a 50% chance of nuclear annihilation in the next decade; reasonable people were predicting this. And to calculate a cohort life expectancy for somebody born today, we would need to predict the course of medical technology, war, and societal dynamics out beyond 2100. That is a long time horizon for science fiction writers, let alone social scientists.

So instead, we’re stuck with period life expectancy at birth as our default measure. Which is fine, as long as we remember how it’s constructed, and what it does and doesn’t reflect.



One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s