influence of your first hilarious off-the-mark prior assumption about your therapistâs perfect punctuality is, through this process, dissolved down to nothing.
This is the promise of
sensed
data, of telemetrics combined with easy-to-update statistical tools such as Bayes.
Finding You
In March 2010, Adam Sadilek, a young Czech-born researcher from the University of Rochester, set out with some colleagues to see how accurately they could predict the location of someone who had turned off his or her GPS, who wasnât geo-tagging tweets or posts, who was in effect going incognito. Sadilek and his team sampled the tweets of more than 1.2 million individuals across New York City and Los Angeles (Americaâs chirpiest cities). After a month, the team had more than 26 million individual messages with which to work; 7.6 million of those tweets were geo-tagged.
They trained an algorithm using Bayesian machine learning to explore the potential patterns among the Tweeters. The idea was to uncover the conversations between the users, contextualize what conversations were taking place across the New York and Los Angeles landscapes, and see if they could use that information to discover information about people who were friends with the geo-taggers but who werenât themselves geo-tagging.
Turns out that your friendsâ geo-tagged tweets provide a great indication of where youâve been, even if you werenât in that place with that friend. Because you, like most people, are probably a creature of habit, where youâve been is an excellent indicator of where youâre going.
Letâs say Sadilekâs system has no âhistorical informationâ on you. You donât geo-tag tweets; you keep your phoneâs GPS setting off; you are invisible, a covert operative. But in order to maintain your cover, you established a Twitter account using a dummy e-mail address. Letâs also say youâve got two friends on Twitter. Theyâre real friends, people you talk to about events in real life and with whom you relate in the real world. You see them in class, at clubs, in line at the post office. Like a lot of other people, these two friends do geo-tag their tweets. Sadilekâs system can predict
your
location at any moment (down to 328 feet and within a twenty-minute frame) with 47 percent accuracy. That means heâs got a 50 percent chance of catching you at any given moment. 18
I know, I know, you did everything right. You were a careful steward of your privacy. Itâs not fair that a twenty-five-year-old PhD grad from Czechoslovakia should be able to find out so much about you so effortlessly. It was your friends who gave you away without even realizing it. Now your not-so-secret-agent career is over.
I went to meet Sadilek at an AI conference. Sitting in the executive lounge on the top floor of the Toronto Sheraton, we overlooked downtown and saw people parking their bicycles, waiting for buses, talking on phones, walking with heads pointed toward shoes, white iPod cords dangling from their ears, people coming and going from little secret rendezvous that every one of them presumed were unknowable to the outside world. We talked a bit about human predictability.
âSomehow, growing up as a teenager, I always was sort of put off by how predictable people are. I never liked that. I liked people that were random.â
Since entering the field of machine learning, Sadilek has comeface-to-face with a hard truth. Human behavior is far more predictable than anyone ever predicted; surprisingly predictable you may even say. One experiment in particular proved this in a way that astounded even Sadilek.
The year was 2011 and he was about to start an internship at Microsoft with researcher John Krumm. In his years of working at Microsoft, at a time when the company was at its most ambitious and adventurous, Krumm was able to amass a rather unique data set. He set out to make a sort of living