The latest iteration of OpenAI’s artificial intelligence (AI) chatbot, ChatGPT, and the bot’s almost uncanny capability to write poetry and academic essays that are very difficult to distinguish from human-centric production has recently, and much like other companies linked to Elon Musk, caused a stir in the world of research. This is raising the specter of AI in the service of research fraud and a race-to-the-bottom in research output and publication. As John Gapper warned in the Financial Times, “…if an unreliable linguistic mash-up is freely accessible, while original research is costly and laborious, the former will thrive.” Does a new age of research desktop paper mills that are in easy reach of everyone anywhere present a real and present danger to research integrity?
In short, the risk is already with us. In May, data sleuth Elisabeth Bik tweeted about how image fraud was being boosted by AI, with generative adversarial network (GAN) technology (where algorithms that closely match the human brain are pitted against each other to produce synthetic data) is capable of producing deepfakes in biomedical literature. Ethics and integrity issues are growing exponentially across scholarly communication. F1000’s and Taylor & Francis’ figures tell a story that is reflected across academic publishing, with such cases representing 34 percent of ethics cases for F1000 and about 50 percent of T&F’s ethics cases. Other major issues include duplicate submissions, data integrity, citation manipulation and authorship integrity issues. As Sabina noted recently, the problem is significant, not just because of the volume and extent of the growth in the number of these issues, but also because there are different types of paper mills, and they are all highly adaptive.
Investigating these issues within a context of shifting sands poses many challenges. Nevertheless, publishers play a vital role in ensuring the legitimacy and integrity of what we publish and disseminate across the world. We invest in systems, safeguards and expertise to ensure due process has been applied to the scholarly content we publish. So, when this is manipulated and the integrity of the scholarly record is under threat, it’s vital we take all steps necessary to protect it. Technology is playing an ever more important role for publishers. The ability to detect research integrity and publishing ethics issues needs to be scalable, because some types of misconduct only become noticeable when patterns are detected across a number of different articles and datasets. This is a key area where developers, publishers and other scholarly organizations are collaborating and investing, not just financially, but with time and effort too.
Systems, safeguards and expertise are just one part of the solution. At a recent Westminster Higher Education Forum, there was “wide agreement across the global research system” that open research is critical in reducing research waste and enabling scrutiny of data. We agree. Open data and materials make it harder to fabricate data and conclusions, and access to the underlying data by readers and AI means that issues are more likely to be noticed. As AI and automation, such as automated research workflows, increasingly become an integral part of research (particularly in the analysis of big data), making that data open will significantly benefit the use of AI to interrogate data for fraud. Furthermore, enabling and encouraging the publication of a broad range of outputs including negative/null findings, protocols, incremental studies – a key element of the open research model – minimizes publication and editorial bias and provides additional accessible data for AI tools aimed at combatting research fraud.
Publishers themselves also need to be open to collaboration with stakeholders (including other publishers) across the research ecosystem to tackle the root causes including a system of rewards and incentives that deter rather than feed into incentives to use paper mills. The STM Integrity Hub and its prototype paper mill detector shows what can be achieved through cross-publisher collaboration. The use of automated, AI processes that can spot duplicate publications and other issues between publishers are a crucial development, given publishers’ often distinct submission and publication systems.
However, AI tools cannot do this alone and human judgment also plays a crucial role in safeguarding research integrity. From F1000’s experience of open research, we know that rigorous checks prior to publication by both AI and experienced experts are integral to maintaining research integrity in publications.
There is also a crucial need for more training and education for researchers in publishing ethics as well as research integrity. Many types of misconduct or bad practice issues are not deliberate, but rather the consequence of inconsistent quality of training. This includes training in both good research and publishing practices, including the roles and responsibilities of authors. It’s also important to be aware of what good peer review looks like given that most peer review is still typically conducted anonymously: most researchers only see peer review reports on their own work (unless they happen to also be an Editor of a journal). There is a key role for many of the stakeholders in the scholarly ecosystem to collaborate on this and make such training open to ensure researchers, wherever they are based, can access adequate high-quality information.
There is already a rapidly accelerating race taking place between paper mills using ever more complex AI to produce fake papers at scale and publishers employing ever more sophisticated AI technology to detect issues. Ultimately, to paraphrase one of the best-known, popular reflections on AI and humanity: The future of research is not set. There is no research integrity, but what we make for ourselves.