Stanford Study: It Is Trivially Easy to Identify People With Metadata

John Glaser, December 26, 2013

When the NSA’s bulk collection of every single American’s phone records was disclosed this past summer, defenders of the program argued it was not invasive surveillance because it’s only metadata (who you called, when, and for how long) and doesn’t include the identity of the callers or the content of the conversation. “There are no names, there’s no content in that database,” Obama said in June.

A new study at Stanford University has just ripped that argument to shreds.

Stanford computer scientists Jonathon Mayer and Patrick Mutchler found that it is “trivially” easy to determine the identity of callers if all you have is metadata. They write about their research in a blog post:

So, just how easy is it to identify a phone number?

Trivial, we found. We randomly sampled 5,000 numbers from our crowdsourced MetaPhone datasetand queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.

What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.

How about if money were no object? We don’t have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched.1 Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.

If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers.

This shouldn’t be too surprising to anyone that has been paying attention. When the Snowden leaks broke, NSA whistleblower William Binney took issue with arguments like Obama’s that said metadata wasn’t revealing. Binney said collecting metadata can be more revealing than listening in to the content of phone calls.

This study represents just another in a long line of definitive knock-downs of pro-NSA arguments. The transparency that Snowden’s leaks have imposed on the government and its defenders has mortally embarrassed them and allowed for each of their arguments – which we would otherwise have to take on their word – to be disproven.

That is true in general, but it is especially true of the metadata program. The disclosure of this program proved James Clapper, the Director of National Intelligence, to be a bald-faced liar given that he testified to Congress that no such program existed. Then NSA chief Gen. Keith Alexander said the metadata program foiled 54 terrorist plots, a justification that was later proven (and admitted by Alexander) to be completely false. Then they said it was perfectly legal, until we found out that a FISC ruling found in 2011 that the NSA “frequently and systematically violated” statutory laws restricting how intelligence agents can search databases of Americans’ telephone communications. To add to that, a federal judge essentially ruled it unconstitutional. And now we discover that their “metadata-isn’t-really-invasive” argument is also baloney.

Before Edward Snowden, NSA overreach was, to borrow a phrase, an unknown unknown. After Edward Snowden, they have to lie about it…repeatedly…apparently without a whiff of shame.

15 Responses to “Stanford Study: It Is Trivially Easy to Identify People With Metadata”

  1. So study says we can get names from phonenumbers. This seems obvious, unless one took the trouble to operate entirely under aliases and p.o. boxes under aliases and, even if one kept their phone number quite private, the phone companies have it right? And we know the phone companies cooperate with the NSA, AT&T had a whole room for the NSA under W Bush right? We know Verizon is buddy buddy with the NSA. It seems the phone companies that dared to resist the NSA (Qwest) were punished for it. Qwest was denied government contracts and the CEO was claims he was hit with insider trading charges for resisting the NSA. So even they can't get your name from facebook, they can get it from the phone companies, who likely will comply. Unless one makes a great deal of effort to be anonymous. So kinda duh.

  2. "When the Snowden leaks broke, NSA whistleblower William Binney took issue with arguments like Obama’s that said metadata wasn’t revealing. Binney said collecting metadata can be more revealing than listening in to the content of phone calls"

    Yea there's more to a statement like this than just finding out your name. By metadata you can track political organizing. Gee, you sure do email/phone the people in the local socialist/anarchist/libertarian/environmentalist/whatever group a lot. They can find out who your friends are. It's very easy to track political organizing this way. By the way I see your getting mailings from those petitions you signed. They can even track intellectual interests this way. Find a vector, this person seems to be "infecting" everyone with radical ideas.

    So you can know everything just tracing the vectors. One might evne say that's what it's all about.

    One open question is if the NSA reads the content of phone calls/emails, I think where we're at is they deny it, and we basically all kinda know they do. But even if they didn't enough could be had from metadata.


    a great little article that is written from the perspective of a British court actuary who uses two simple charts to identify paul revere as a potential trouble makers.

  4. The phone call is the data the actually voice recording. The metadata or data about the data is, the account number and all the information about that account, who account, what their activity is like, when they call, from where they are calling from and to whom. There are all kinds of technical details about these calls like the ends point and the path that the data took and so forth.

    Then in a second program you collect all the actual voice callls then record them with the end point and time information. You take the voice recordings, cross reference them with the metadata and volla you have the entire ball of wax.

    Let's say I want to look up all the voice conversations that john smith made. I go and look up Obama and all the voice recordings that we captured from any device or phone that we know about. Then I say I need to look up all the voice recordings to a specific person. Then I scan the calls for Benghazi. I find the one I need and tell the president that I will blackmail him unless he does my bidding. That is how the system works to distort democracy.

    Naturally there are a bunch of snowdens out there that can collect this information for the benefit of foriegn entities or political parties and no one knows about them. The only evidence is we see political folks behaving against the interests of the nation. Or better yet we give the Israeli the entire collection so they can blackmail or politicians.

  5. Who's to say politicians aren't being used, even willingly, by Israeli special agents? As they routinely act on that nations behalf you'd be sorely pressed to believe otherwise.

  6. here is what I expected. thank you

  7. Known unknown. :p
    Anarcho Capitalism

  8. In regard to email, there is no separate "metadata" from the content. Since early days, email headers – everything about from whom, to whom, systems transferred through, date etc. have been an intrinsic part of the message content. Metadata is a term that doesn't fit email technology well – which is why the experts gave up trying to guarantee secure email using current technology.

  9. Yes, this is exactly what it is all about. When the shit really hits the fan in this country and all the couch potatoes are out in the streets, because that and homelessness and starvation are the only options, all the leaders will be identified and rounded up. That's what it is all about.

    When all this is happening, THAT is when they will start listening to phone calls and reading the emails of these key individuals.

  10. Another Snowden leak said that we send the data to Israel unfiltered. I suppose they use their best discretion and only blackmail when it will be hard to trace the source.

  11. You really cannot secure email very well. Man in the middle will corrupt your detected encrypted messages and force you to send in the clear. Once they have your key they will let them through. Not that you cannot get around it, but it should not be so painful.

  12. We all already knew that states, governments and elected officials are cheats, liars and untrustworthy.
    Snowden furnished the proof, so now they will be obliged to pass new laws to make their lies "legal" so that we become the criminals.
    How about another piece of cake?

  13. […] […]

  14. Who's to say politicians aren't being used, even willingly, by Israeli special agents? As they routinely act on that nations behalf you'd be sorely pressed to believe otherwise.

  15. tanks in information
    you are really" good