GPT-Fabricated Scientific Papers Flood Google Scholar, Scientists Say

In a new study published in the Harvard Kennedy School Misinformation Review, researchers from the University of Borås, Lund University and the Swedish University of Agricultural Sciences found a total of 139 papers with a suspected deceptive use of ChatGPT or similar large language model applications; out of these, 19 were in indexed journals, 89 were in non-indexed journals, 19 were student papers found in university databases, and 12 were working papers (mostly in preprint databases); health and environment papers made up around 34% of the sample; of these, 66% were present in non-indexed journals.

Word rain of environment- and health-related GPT-fabricated, questionable full-text papers. Image credit: Haider et al., doi: 10.37016/mr-2020-156.

The use of ChatGPT to generate text for academic papers has raised concerns about research integrity.

Discussion of this phenomenon is ongoing in editorials, commentaries, opinion pieces, and on social media.

There are now several lists of papers suspected of GPT misuse, and new papers are constantly being added.

While many legitimate uses of GPT for research and academic writing exist, its undeclared use — beyond proofreading — has potentially far-reaching implications for both science and society, but especially for their relationship.

“One of the major concerns with AI-generated research is the increased risk of evidence hacking — that fake research can be used for strategic manipulation,” said University of Borås researcher Björn Ekström.

“This can have tangible consequences as incorrect results can seep further into society and possibly also into more and more domains.”

In their study, Dr. Ekström and his colleagues searched and scraped Google Scholar for papers that included specific phrases known to be common responses from ChatGPT and similar applications with the same underlying model: ‘as of my last knowledge update’ and/or ‘I don’t have access to real-time data.’

This facilitated the identification of papers that likely used generative AI to produce text, resulting in 227 retrieved papers.

Out of these papers, 88 papers were written with legitimate and/or declared use of GPTs and 139 papers were written with undeclared and/or fraudulent use.

The majority (57%) of the questionable papers dealt with policy-relevant subjects (i.e., environment, health, computing), susceptible to influence operations.

Most were available in several copies on different domains (e.g., social media, archives, and repositories).

“If we cannot trust that the research we read is genuine, we risk making decisions based on incorrect information,” said University of Borås Professor Jutta Haider.

“But as much as this is a question of scientific misconduct, it is a question of media and information literacy.”

“Google Scholar is not an academic database,” she noted.

“The search engine is easy to use and fast yet lacks quality assurance procedures.”

“That’s already a problem with regular Google results, but is even more problematic when it comes to making science accessible.”

“People’s ability to decide which journals and publishers — for the most part — publish quality-reviewed research is important for finding and determining what constitutes reliable research and is of great importance for decision-making and opinion formation.”

_____

Jutta Haider et al. 2024. GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation. Harvard Kennedy School Misinformation Review 5 (5); doi: 10.37016/mr-2020-156

This post was originally published on here

GPT-Fabricated Scientific Papers Flood Google Scholar, Scientists Say

Related Posts