The era of analyzing big data to research issues in social science is among us. Given that the era only really arrived with the advent of the data itself, and those pools of data – sources such as social media, unstructured text, digital sensors, financial and administrative transactions — have only recently become widely available as commodities, it’s reasonable to think that big data research is a young academics game.
In fact, based on findings reported in a recently released white paper, that’s not necessarily so. There was no difference in career stage among those doing – or not doing – big data research among respondents to a survey. In short, the idea that early-career researchers were somehow more likely to be digital natives and therefore more apt to conduct computational social science than those whose PhDs were issued more than a decade ago was not sustained.
SAGE Publishing (the sponsor of Social Science Space) surveyed social scientists around the world to learn more about who engages in research using so-called ‘big data,’ and what challenges they face as well as the barriers facing those who are interested in conducting computational social science going forward. Summarizing those results, SAGE has published a white paper, Who Is Doing Computational Social Science? Trends in Big Data Research, authored by Katie Metzler, publisher for SAGE Research Methods; David A. Kim, in the Department of Emergency Medicine at Stanford University; Nick Allum, a professor of sociology and research methodology at the University of Essex; and Angella Denman of the University of Essex.
“The findings from this survey reveal that there is an appetite to engage with data at an accelerated rate among social scientists,” said SAGE’s director of global publishing, Ziyad Marar, “but that unique challenges persist related to such issues as interdisciplinary, research design training, and access.”
The survey team initially contacted more than a half million social science contacts, of whom 9,412 fully completed the survey. The majority, 7,933, described themselves as from academe, with the next largest sector, government, providing 527 answers. A plurality of answers — 3,302– came from the United States, followed by the United Kingdom (728), India (405), and Canada (353), although the responses were genuinely global, with 35 counties each supplying at least 50 completed surveys.
Disciplines of respondents were also all over the map, with education, psychology and health sciences each providing more than a thousand respondents. Nonetheless, fields as diverse as the law, nursing, marketing and history joined more traditional social science disciplines such as political science, demographics, criminology and sociology in supplying respondents.
One third of respondents self-identified as having been involved in big data research of some kind, with one of four of them reporting that all or most of their research involved big data or data science methods. Some 60 percent of researchers reporting big data work said they had conducted their big data research within the last 12 months.
Predictably, who is doing the most big data research is in large part explained by type of research associated with the respondent’s discipline. And so, the most common disciplines reporting any big data research were social statistics and research methods, where almost three out of five respondents had been involved in big data research at some point, economics (about half), demography, population studies, and human geography (slightly less than half), and health sciences (slightly less than two out of five).
“Overall,” wrote the white paper’s authors, “these percentages seem very high (especially in the case of history and anthropology, which are not typically disciplines associated with big data), and this further suggests that researchers who are very interested in big data and who are already engaged in big data research were more likely to complete the survey. It may also indicate ambiguity about what people understand by the terms big data and data science.”
Of the remaining two thirds of respondents, those who have not yet engaged in big data research, half of them (3,057 respondents) said that they are either “definitely planning on doing so in the future” or “might do so in the future.” That means that a substantial number of respondents don’t expect to do any big data work period, and while it might seem difficult to escape some brush with big data, some 1,083 of respondents said they definitively are not planning on doing it.
Where the data comes from
When natural scientists grapple with big data, they almost inevitably receive raw information from instruments that were designed to provide them the material they wanted in a format for which they had prepared. Social and behavioral scientists, in contrast, are often tapping data flows designed by other people – entrepreneurs and government are key examples – and for different purposes. Think of Facebook, which in its earliest incarnation wasn’t about gathering information at all, merely facilitating connections among Harvard students.
The white paper authors asked social and behavioral researchers about what data sources they used and what tools they used to tap these sources. Among respondents who are already active in computational social science, by far the most common data source they had most recently used for their endeavors was administrative data – government generated data on subjects as diverse as government departments and can include health, education or income. Some 55 percent of respondents reported having used that in their most recent research involving big data.
The next largest source, cited by 29 percent, was social media data, such as Facebook or Twitter. (Multiple answers were possible.) The third most commonly cited was commercial or proprietary data, cited by 23 percent of respondents. Giving an idea of the scope of what can constitute ‘big data,’ the fourth most common response includes photographs, video or audio sources.
Among that third of respondents who have already conducted computational social science, the tools they need are a prime subject. For example, since big data is by definition ‘big,’ a distributed computing infrastructure is necessary. Among those who have used such shared software systems, the most commonly used was Hadoop, followed by two Hadoop subproducts, MapReduce and Spark. The authors of the white paper, however, wrote that respondents may have been confused by what counted as “other distributed computing.”
“Although 579 researchers answered with software that is used for big data research, 1248 respondents used traditional software (SPSS and STATA) for their research. While SPSS and STATA have both been enhanced to handle larger data sets, there is also a possibility that respondents who answered naming a traditional software package were either not working with very large data sets or were working with smaller subsets of a large data set, which is common among researchers in the social sciences engaging with social media data.”
The authors also asked active researchers if they had shared either their bespoke code or software they may have developed with other researchers. For a majority, the answer was no. Among those who had shared, the most common way to share was via email (19 percent of the 873 answering this question), followed by submitting supplementary material as part of the publishing process (12 percent). Only 56 reported using GitHub.
What are the challenges
Because ‘big data’ is new, interdisciplinary and, well, big, that it would present “unique problems” to researchers, the authors posited. In fact, a lot of the biggest problems faced by researches will ring true in any academic endeavor –elusive funding, elusive data and that elusive perfect collaborator.
Among big data researchers surveyed, 42 percent identified funding as a “big problem,” followed by 32 percent who cited gaining access to commercial or proprietary data and 30 percent “finding collaborators with the right skills and knowledge.” (Multiple answers were allowed.) Of course, the nature of those challenges may have a different complexion for social data researchers, and the respondents also identified challenges that definitely had a big data cast. For example, 30 percent cited learning new software as a major challenge, and 27 percent “learning new analytic methods for myself.”
Looking at that funding question a little more deeply, a plurality of big data researchers – 30 percent –said that university or institutional funding was their main source, followed by government or NGO sources (25 percent) and a science-funding body (15 percent). Some 12 percent said they had self-funded their work and 7 percent said they had tapped a private company.
When asked to name specific problems they had encountered in doing big data research, funding and data access remained the biggest issues. However, some new wrinkles developed, including developing effective research designs, establishing a successful career in an interdisciplinary field, and choosing a suitable journal for publishing findings. “Interestingly,” the authors write, “those who reported that most or all of their research was big data were more likely to say that ‘choosing a suitable journal’ was a problem for them compared to those whose research is less focused on big data.” Some 48 percent of respondents who already did big data research said their work had been published in a journal, including medical, social science, science, and methods journals – but rarely journals dedicated to publishing computational social science research, in large part because those are currently few and far between.
The survey asked researchers who weren’t doing big data work – but who were interested — what was staying their hand. Finding collaborators with the right skills and the amount of time required to learn a new field were cited as the biggest impediments, listed as “big problems” by 29 and 21 percent of the 4,894 answering that question. Big data research is a new-ish field for social scientists, and despite the term’s currency, that novelty still creates roadblocks: Some 12 percent said “big data not recognized or used in my field” was a big problem, and an additional 40 percent said it was somewhat of a problem.
That state of affairs is diminishing over time, the white paper’s authors conclude:
“… [A]t SAGE Publishing we believe that social research is at a turning point. However, the successful collection and rigorous analysis of this data require new skills, new collaborations, new research methods, and new computational tools. The findings of the survey suggest that many social scientists are already rising to some of the challenges posed by big data, and that a large number of social scientists are looking to engage in this kind of research in the future.”