Journalism has been called “history’s first draft.” In that sense, the efforts of Seth Stephens-Davidowitz to decipher Google Trends data could be called computational social science’s first draft. Stephens-Davidowitz, who earned his Ph.D. in economics at Harvard and is a lecturer at the Wharton School of Business, is currently on a book tour promoting his new book, Everybody Lies: Big Data, New data, and What the Internet Can Tell Us About Who You Really Are, and spoke Tuesday night at the University of California, Santa Barbara.
The underlying conceit of Stephens-Davidowitz’s work, whether in the book, on the op-ed pages of the New York Times or in the classroom, or on campus, is that people’s search activity on a search engine reveals much more about them than do surveys, polls, or other social media. (Note the focus on ‘people’ and not ‘individuals’ – Google reports out aggregated and anonymized search results.) He dubs Google – and keep in mind he worked at Google as a data scientist – a “digital truth serum.”
Contrast that with Facebook, which Stephens-Davidowitz called a “digital brag-to-my-friends-how-good-my-life-is serum.” In Google searching, individuals expect privacy – more on that later – and so are more honest. Hence the title of the book – Everybody Lies – since it is only with Google, we might hyperbolically expect, that people are completely honest.
In support of that, he offered a series of examples where information derived from aggregated Google searches proved either powerful or even more predictive than professionally conducted surveys.
- Searching for topics such as ‘how to vote’ or ‘where to vote’ proved a better indicator of U.S. voters’ intention to vote than did polling results, and indicated that areas of supposed Hillary Clinton strength in the 2016 presidential vote were likely to prove disappointing.
- Inquiries on how to commit suicide also showed strong evidence of intent in various geographic areas.
- A heat map of searches for information on how to self-induce an abortion “correlates almost perfectly with places where it’s hard to get an abortion.” The state of Mississippi comes in at No. 1 on both maps.
- Searches for the so-called ‘N-word’ (and not in the context of hip-hop music) was the strongest predictor of an area voting for Donald Trump.
And many times Google searches reveal things no one knew to look for. So, when looking at the most popular searches beginning “My husband wants …” in India leads to the top choice of “… me to breastfeed him.” India, added Stephens-Davidowitz, is the only country where that result tops the list, a finding supported by looking at other lactation-oriented inquiries that show up in the subcontinent, where only about half the searches for breastfeeding involve infants.
Interrogating Google, he continued, revealed something “so secretive and so taboo that while it’s widely known it’s not widely covered.”
Making something useful out of this data is another thing altogether, and much as Harvard’s Gary King insists that it’s the analysis, and not the ‘big’ or the ‘data’ that’s important, Stephens-Davidowitz acknowledged that finding the “right lens” to view search data matters greatly.
And while the Trends material has great promise for reach – the whole world – and for speed, it’s a blunt tool in many ways. Analysts can made educated guesses on where subjects physically are at the city level, and on some other demographic markers based on their browser or computer or language choices, but nothing definitive. Even that incomplete picture creates vivid concerns about privacy, especially since the “subjects” have not agreed to any such investigation (well, perhaps they did in those acres of fine print everyone agrees to just after download …) and are counting on the good graces of Google or other private companies not to discover and reveal more.
Stephens-Davidowitz, for his part, and perhaps given his ties to Google, said he has a lot of faith in Google’s reliability in protecting privacy, shielding the date from hackers, and resisting entreaties to share more. He based that assessment both on Google’s stellar bench of computer scientists and the billions it stands to lose if perceived as guilty of violating the public trust.
But that trust is already eroding, and may yet harm the ability to produce quick and accurate insights on the fly. Stephens-Davidowitz says over the last five years he’s seen people have complete trust in their ability to ask Google embarrassing questions in private to the point today where many are skeptical that their privacy is guaranteed. Still, Stephens-Davidowitz said, he’d be more worried about breaches at smaller web companies that don’t have the chops of a Google.
And overall, Stephens-Davidowitz is more sanguine than disheartened. Not that he isn’t gloomy – look at the title of the book again. “I don’t think you can look at Google search data,” he said, “and not have a somewhat pessimistic view of human nature.” But he also sees the benefits of his quick insights.
For example, in reviewing the search data on suicidal ideation, a common precursor search suggests that the individuals asking how to commit suicide are both young and just diagnosed with a sexually transmitted disease. And so, he explains, “I’m optimistic because it gives policy options.”
Or even societal options. One of the other searches by those troubled teens is what celebrities have STDs. The results are few, even though the likelihood that many have had or do have them is great. But the digital silence merely amplifies the searcher’s despair. “Silly as it sounds,” says Stephens-Davidowitz, “if a bunch of celebrities came out and say they have herpes, it will save lives.”