Scientists Can Spot ChatGPT in Academic Texts with 99% Accuracy

Photo - Scientists Can Spot ChatGPT in Academic Texts with 99% Accuracy
As ChatGPT finds its way into our everyday lives, it not only represents a leap into the digital future but also raises serious concerns. Scientists, in particular, are wary, trying to counteract the emergence of seemingly authentic falsehoods in scientific publications.
Scientists from the University of Kansas, led by chemist Heather Desaire, have published an article in the peer-reviewed journal Cell Reports Physical Science. The research delves into the application of machine learning methodologies to detect differences between academic texts penned by humans and those spun out by ChatGPT. According to the research group's findings, their detection tool boasts a staggering 99% accuracy. 
Scientists vs ChatGPT Source: https://www.sciencedirect.com

Scientists vs ChatGPT Source: https://www.sciencedirect.com

What sets an AI's text apart from a scholar's?

The authors of the paper stress the inherent ambiguity and potential risks linked with ChatGPT's functionality. They also point out that this technology could potentially be used not just by students but by scholars as well. There are tools available today that can accurately distinguish between AI-generated and human-written texts. One of the most recognized and effective among these tools is RoBERTa.

RoBERTa can correctly attribute authorship in over 98% of instances, but its use for evaluating scholarly writing is not advised due to insufficient precision. This is because RoBERTa, like many other detectors, is designed for general writing and fails to perform as effectively when analyzing specialized content.

During their research, Desaire's team analyzed 64 documents produced by humans and 128 texts authored by ChatGPT. Using these training data, they generated 1276 example paragraphs. Upon comparing these examples, the researchers established four categories of features to distinguish between a chatbot's and a human's writing. These features include:

  • Paragraph complexity
  • Variety in sentence length
  • Punctuation usage
  • Presence of "common" words  
Two of the four categories of features used in the model are ways in which ChatGPT produces less complex content than humans. The largest distinguishing features were the number of sentences per paragraph and the number of total words per paragraph. In both cases, ChatGPT’s averages were significantly lower than human scientists,
reads the paper.
The researchers also highlighted a noticeable difference in sentence structure. Scholarly writing by humans is characterized by a variable sentence length, often using both very long (over 35 words) and very short (less than 10 words) sentences. This variability is not observed in AI-created texts.

Another noteworthy observation pertains to the stylistic aspects of writing. The study found that ChatGPT tends to convey information in a more general manner, resorting to less specific phrasing (such as "researchers", and "others"), and frequently using single quotation marks. On the other hand, scholars often incorporate proper names, acronyms, numerical data, scientific paper titles, and author names in their texts, while also employing a wider variety of punctuation marks like dashes, parentheses, colons, semicolons, and question marks.

The features identified by the researchers delivered a 99.5% accuracy rate when analyzing the sample paragraphs. The scientists underline that their study was primarily aimed at conceptualizing and testing a tool, which is why the scope of their work was somewhat limited. Further studies are needed to assess the model's effectiveness and its potential for reliably attributing document authorship.