Internet ads
that are only vaguely interesting can attest. But when the ads are on track, they
can be eerily creepy—and we often don’t like it. It’s one thing to see ads for hemorrhoid
suppositories or services to help you find a girlfriend on television, where we know
they’re being seen by everyone. But when we know they’re targeted at us specifically,
based on what we’ve posted or liked on the Internet, it can feel much more invasive.
This makes for an interesting tension: data we’re willing to share can imply conclusions
that we don’t want to share. Many of us are happy to tell Target our buying patterns
for discounts and notifications of new products we might like to buy, but most of
us don’t want Target to figure out that we’re pregnant. We also don’t want the large
data thefts and fraud that inevitably accompany these large databases.
When we think of computers using all of our data to make inferences, we have avery human way of thinking about it. We imagine how we would make sense of the data,
and project that process onto computers. But that’s not right. Computers and people
have different strengths, weaknesses, and limits. Computers can’t abstractly reason
nearly as well as people, but they can process enormous amounts of data ever more
quickly. (If you think about it, this means that computers are better at working with
metadata than they are at handling conversational data.) And they’re constantly improving;
computing power is still doubling every eighteen months, while our species’ brain
size has remained constant. Computers are already far better than people at processing
quantitative data, and they will continue to improve.
Right now, data mining is a hot technology, and there’s a lot of hype and opportunism
around it. It’s not yet entirely clear what kinds of research will be possible, or
what the true potential of the field is. But what is clear is that data-mining technology
is becoming increasingly powerful and is enabling observers to draw ever more startling
conclusions from big data sets.
SURVEILLING BACKWARDS IN TIME
One new thing you can do by applying data-mining technology to mass-surveillance data
is go backwards in time. Traditional surveillance can only learn about the present
and future: “Follow him and find out where he’s going next.” But if you have a database
of historical surveillance information on everyone, you can do something new: “Look
up that person’s location information, and find out where he’s been.” Or: “Listen
to his phone calls from last week.”
Some of this has always been possible. Historically, governments have collected all
sorts of data about the past. In the McCarthy era, for example, the government used
political party registrations, subscriptions to magazines, and testimonies from friends,
neighbors, family, and colleagues to gather data on people. The difference now is
that the capability is more like a Wayback Machine: the data is more complete and
far cheaper to get, and the technology has evolved to enable sophisticated historical
analysis.
For example, in recent years Credit Suisse, Standard Chartered Bank, and BNPParibas all admitted to violating laws prohibiting money transfer to sanctioned groups.
They deliberately altered transactions to evade algorithmic surveillance and detection
by “OFAC filters”—that’s the Office of Foreign Assets Control within the Department
of the Treasury. Untangling this sort of wrongdoing involved a massive historical
analysis of banking transactions and employee communications.
Similarly, someone could go through old data with new analytical tools. Think about
genetic data. There’s not yet a lot we can learn from someone’s genetic data, but
ten years from now—who knows? We saw something similar happen during the Tour de France
doping scandals; blood taken from riders years earlier was