Stanford Study: It Is Trivially Easy to Identify People With Metadata
When the NSA’s bulk collection of every single American’s phone records was disclosed this past summer, defenders of the program argued it was not invasive surveillance because it’s only metadata (who you called, when, and for how long) and doesn’t include the identity of the callers or the content of the conversation. “There are no names, there’s no content in that database,” Obama said in June.
A new study at Stanford University has just ripped that argument to shreds.
Stanford computer scientists Jonathon Mayer and Patrick Mutchler found that it is “trivially” easy to determine the identity of callers if all you have is metadata. They write about their research in a blog post:
So, just how easy is it to identify a phone number?
Trivial, we found. We randomly sampled 5,000 numbers from our crowdsourced MetaPhone datasetand queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.
What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.
How about if money were no object? We don’t have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched.1 Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.
If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers.
This shouldn’t be too surprising to anyone that has been paying attention. When the Snowden leaks broke, NSA whistleblower William Binney took issue with arguments like Obama’s that said metadata wasn’t revealing. Binney said collecting metadata can be more revealing than listening in to the content of phone calls.
This study represents just another in a long line of definitive knock-downs of pro-NSA arguments. The transparency that Snowden’s leaks have imposed on the government and its defenders has mortally embarrassed them and allowed for each of their arguments – which we would otherwise have to take on their word – to be disproven.
That is true in general, but it is especially true of the metadata program. The disclosure of this program proved James Clapper, the Director of National Intelligence, to be a bald-faced liar given that he testified to Congress that no such program existed. Then NSA chief Gen. Keith Alexander said the metadata program foiled 54 terrorist plots, a justification that was later proven (and admitted by Alexander) to be completely false. Then they said it was perfectly legal, until we found out that a FISC ruling found in 2011 that the NSA “frequently and systematically violated” statutory laws restricting how intelligence agents can search databases of Americans’ telephone communications. To add to that, a federal judge essentially ruled it unconstitutional. And now we discover that their “metadata-isn’t-really-invasive” argument is also baloney.
Before Edward Snowden, NSA overreach was, to borrow a phrase, an unknown unknown. After Edward Snowden, they have to lie about it…repeatedly…apparently without a whiff of shame.