The web has been awash with speculation to supplement the few details that have been revealed about the PRISM program, whereby the NSA received substantial data from various Internet companies including Microsoft, Google and Apple. Corey Chivers at bayesianbiologist has posted some speculative analysis of the PRISM program. He explores some interesting ideas, but his argument is too simplistic.

The argument relates to a common error of reasoning called the base rate fallacy. It works in the following way: suppose a particular cancer strikes 1 in 1000 in the general population, at random. Now, suppose we have a screening test for the cancer which is advertised as ‘99% effective’; more precisely:

- it will correctly detect 99 out of 100 cancers (false negative rate = 0.01)
- it will incorrectly detected cancer in 2 out of 100 healthy people (false positive rate = 0.02)

These are both bad outcomes, in different ways. With most screening tests there is some threshold parameter which can be used to trade the false positive rate against the false negative rate. Set the threshold higher, and you increase the false negative rate; set it lower, and you increase the false positive rate.

Anyway, this might look like a pretty effective test. If you tested positive for the cancer, you’d be worried. But what is the probability that you actually have cancer?

If you said 99%, you’re wrong, but you’re in good company. I once tutored an Information Security class for advanced undergraduate computer scientists, and most got it wrong. It helps to understand Bayes’ theorem to see why it’s wrong, but that’s not really necessary: commonsense works just as well. We can use a ‘contingency table’ to see what’s going on. I’m going to assume a population of 1000, although since we’re really dealing with proportions, that choice is arbitrary. First fill in the general rate of cancer in the population (the ‘base rate’).

(Of 1000 people) | Tested positive | Tested negative | Total |

With cancer | 1 | ||

Without cancer | 999 | ||

Total | 1000 |

Now, we have to carefully add the false positive / false negative information. Think about it a little, and you should see that the percentages apply to the row totals, not the grand total.

(Of 1000 people) | Tested positive | Tested negative | Total |

With cancer | 99% of 1 | 1% of 1 | 1 |

Without cancer | 2% of 999 | 98% of 999 | 999 |

Total | 1000 |

Now we can go ahead and work out these percentages, and sum the columns (the rows and columns must sum up in a table like this).

(Of 1000 people) | Tested positive | Tested negative | Total |

With cancer | 0.99 | 0.01 | 1 |

Without cancer | 19.98 | 979.02 | 999 |

Total | 20.97 | 979.03 | 1000 |

So now we can see the answer to our question: out of the 20.97 people who tested positive, only 0.99 had cancer, for a rate of 4.7%. So in this situation, even if you test positive for cancer, there’s only a 4.7% chance that you have it – the other 95% are just false positives. That is a very different prospect from 99%. Returning to maths for a second, this is because .

This kind of thing happens whenever the ‘base rate’ of a particular attribute is very low, and this is the argument Corey is making about PRISM. Since very few people are terrorists (say 1 in 1 million), a screening test would need to be extraordinarily sensitive not to be overwhelmed by false positives. This is often a good argument against various screening and profiling measures, but it’s not so relevant here.

The difference here is that the NSA is probably not simply screening every Facebook user for terrorist-like attributes. They’re far more likely to be starting with a list of ‘known bad guys’, and focussing on their associates (Facebook friends, gmail correspondents, etc). By intelligently narrowing the target group, they can vastly increase the base rate. Continuing my cancer example, imagine now that this cancer is hereditary, so that if one of your parents had it, your chances of developing it are 1 in 10. We might now choose to screen only those in this target subgroup (suppose it is of size 100), and now our table looks like this:

(Of 100 people) | Tested positive | Tested negative | Total |

With cancer | 0.99*10=9.9 | 0.01*10=.1 | 10 |

Without cancer | .02*90=1.8 | 0.98*90=88.2 | 90 |

Total | 11.7 | 88.3 | 100 |

Now the cancer is still more uncommon than not in the target group, at 10%. But the probability that you have it, given that you tested positive, is now 9.9 out of 11.7 – a huge leap to 85%. That really is something to worry about. It is for exactly this reason that many screening tests are only recommended for certain high-risk subgroups of the general population.

The NSA can do the same thing with the PRISM data: target high-risk subgroups. In particular, even if only 1 person in 1 million is a terrorist, it is plausible that the relevant base rate is much higher, if we target associates of a known (or even suspected terrorist). If we can increase the base rate from 1 in 1 million to 1 in 10,000, then this might have substantial effects on the rate of false positives, as we saw above (for a similar 100-fold increase in the base rate). So for this reason, I don’t think the base rate fallacy really applies here. Now I’m sure the US Government list of ‘known terrorists’ is highly imperfect (stories abound of people being targeted because they share the name of a suspected person, for instance) – but it is bound to be better than starting from nothing.

There are two further brief comments I want to make about Corey’s analysis.

First, suppose the NSA really did do the naive screening thing, on, say, Facebook messages. What kind of false positive / false negative rate might they expect? Corey assumes 1% false positives and 1% false negatives, but that seems quite conservative. A comparable task is spam filtering (instead of ‘is this a terrorist message’ we ask ‘is this spam’). A quick check of my Gmail spam folder suggests I got about 700 spam emails in the last 30 days, and (recall bias notwithstanding) I don’t recall any slipping through the net: that’s a false negative rate of better than 0.2%. Meanwhile, I can’t recall the last time a genuine message was classified as spam, and I get maybe 1000 emails a month: so the false positive rate must be minuscule. Granted, the NSA does not have the huge classified corpus of terrorist (spam) messages that Google does, but they’re also probably working harder on this. In the world of machine learning problems, classifying texts is about as easy as it gets.

Second, the consideration of false positives sounds important, but it’s really meaningless without attaching some costs (economic or otherwise) to the various outcomes. Corey’s false:true positive ratio of 10,000 to 1 sounds unmanageable, and if the response to a positive is to dispatch a Predator drone laden with Hellfire missiles, then it probably is. But if you choose a more measured response, like escalating the case to a 23-year old NSA analyst, this ratio might be perfectly manageable (the agency may have 50,000 employees). In just the same way, doctors will often use a less sensitive screening test to refer patients to a more reliable (e.g. radiological) investigation, before committing to drastic action.

So for all three reasons, I think Corey vastly underestimates the potential effectiveness of a PRISM-like scheme. However, since nobody with a security clearance is going to enter this debate, all I have is my assumptions: so I can’t promise my analysis is any closer to the truth than his. Ultimately, we have no idea. But what I can promise is that the intelligence community is well aware of statistical fallacies, and I would imagine that America’s largest employer of mathematicians has thought pretty hard about this topic.