The U.S. Department of Defense is trying to find a way to assess and improve the credibility of social and behavioral science research using algorithms. There’s lots to unpack in that statement, but a key question is can machines really determine the credibility of research, ideally without conducting actual replications?
Replication is a key practice in scientific research, playing a role in controlling the impact of sampling error, questionable research practices, publication bias and fraud. Efforts to replicate studies are supposed to help establish credibility within a field (although the bravery of re-testing foundational findings can sometimes create a crisis mentality when too many fail to measure up). As the research community can offer its take about outcomes, we ask again, could those outcomes be forecast formally without actually conducting replications?
DARPA, the Pentagon’s Defense Advanced Research Projects Agency, funded the ‘Systematizing Confidence in Open Research and Evidence’ Program to answer these questions. SCORE aims to generate confidence scores for research claims from empirical studies in the social and behavioral sciences. The confidence scores will provide a quantitative assessment of how likely a claim will hold up in an independent replication, helping DARPA pick winners for funding and application in the field.
In a paper published by Royal Society Open Science, a team of researchers ask a more detailed question of the process, “Are replication rates the same across academic fields? Community forecasts from the DARPA SCORE programme.” Authors Michael Gordon, Domenico Viganola, Michael Bishop, Yiling Chen, Anna Dreber, Brandon Goldfedder, Felix Holzmeister, Magnus Johannesson, Yang Liu, Charles Twardy, Juntao Wang, and Thomas Pfeiffer used prediction markets and surveys of the research community to forecast replication outcomes for material of interest to DARPA. This was the authors’s second look at using prediction markets and surveys to predict replication; their initial proof-of-principle lead them to elicit information on a large set of research claims with only a few actually being replicated.
Most of the participants making predictions were drawn from academia, from undergraduate to emeritus professor roles, although the majority were in early or mid-career. Their disciplines fell into six groupings: economics, political science, psychology, education, sociology and criminology, and marketing, management and related areas, and respondents tended to focus their predictions on their own fields.
The paper reports that the participants expect replication rates to differ between fields, with the highest replication rate in economics and the lowest in psychology and education. Participants interested in economics were more optimistic about the replication rates in economics than those not in economics. No evidence was found for this effect with other topics. Moreover, participants who stated that they have been involved in a replication study before on average forecast a lower overall replication rate.
The authors note that participants expected replication rates to increase over time. “One plausible explanation might be that participants expect recent methodological changes in the social and behavioural sciences to have a positive impact on replication rates,” according to the paper. “This is also in line with an increased creation and use of study registries during this time period in the social and behavioural science.”
The forecasts presented in this study focus on field-specific and time-specific replication rates rather than the probability of replication for individual claims. Explicit forecasts for overall replication rates might be more reliable than what can be inferred from forecasts on individual replications.
See the original article here.