AI-GENERATED TEXT VS. HUMAN-MADE: HOW TO RECOGNIZE IT?

January 11, 2023

Artificial intelligence (AI) is having significant growth in different areas of our daily lives, from personal to professional. Last November, OpenAI made available to the public a beta version of ChatGPT, a conversational language chatbot that responds to questions posed by the user and that has proven to be capable of generating texts with a very remarkable level of writing and information that can make them difficult to distinguish from writing made by a person.

Since its release, more than one million users have queried this chat. This is intriguing to the AI community, and it is clear that AI is creating an increasing amount of text on the Internet. ChatGPT has generated jokes, children's stories or scripts of made-up stories, drafts of emails, or answers to any question that pops into the user's mind (always respecting certain limits, such as in the case of racist, discriminatory, or sentiment-related content).

Considered from a given perspective, the use of AI for generating texts may have its advantages, although at the same time it may present issues, especially in academic or even journalistic environments. A professor or chief editor, for instance, may struggle to identify whether a text was produced by an AI-powered tool or by a student or journalist. So how to tell whether a text was written by a human or an AI software?

Researchers try to identify by different methods. First, by analyzing certain aspects such as fluency, repetition of certain words, punctuation patterns, and sentence length. For instance, AI-generated content often includes very short sentences. This is because AI tries to imitate human writing, although it has not yet mastered the complexity of sentences; or since machines collect data from different sources, they sometimes make mistakes, so if you read a text where there are several discrepancies between facts and numbers, it could probably have been written by AI.

There are also spelling software programs that can find unusual mistakes or sentences that are too complex or too simple. And finally, intuition can help to find suspiciously "perfect" texts. Besides, if we think about it, it is rare for a language model to make typos. When we find typos, it is a good indicator that it is a text written by a person.

However, none of his efforts can be considered totally successful. At this point, the same AI-based language models can try to address this conundrum and help to detect their peers.

The American magazine Fast Company presented, in one of its reports, some tools developed to identify texts written by AI. Among them, the first one mentioned was created by OpenAI, the same company that has created major interest in the topic with its very effective ChatGPT. Its "artificial" text detection tool is called GPT-2 Output Detector Model. It was introduced in 2019 with a demo version accessible online, where you can paste a text and get an answer about the probability that it was written by a robot.

According to many tests conducted by OpenAI, the detection rate of this system is relatively high, "but it needs to be complemented with metadata, human judgment, and public education approaches to be more effective."

Another tool that was released shortly after GPT-2 is GLTR (Giant Language Model Test Room). This is an algorithm developed by experts at the MIT-IBM Watson AI Lab and Harvard University's Natural Language Processing Group that bases its AI text detection system on the idea that lookalikes recognize each other.

Thus, if the algorithm can predict the next word in a given sentence, it will assume that it has been written by an AI. The main idea behind the tool's complementary design is that people are more likely than machines to use unusual words in their writing, while AI tends to use more common words.

The same OpenAI is working on watermarks to identify if a text is generated by AI or written by humans. According to Scott Aronson, a member of OpenAI's research team, the company already has a prototype of the tool. He stated that the goal is for ChatGPT to generate a long text that contains a secret signal to prove that it was generated by GPT. Only the company would have access to that key.

He also stated that “We want it to be much harder to take [an AI system] output and pass it off as if it came from a human. This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda… Or impersonating someone’s writing style in order to incriminate them.”

Since technologies are moving rapidly, detection models will have to keep pace with the changes to solve and prevent ethical challenges.

Final thoughts

Models like ChatGPT can help on a daily basis, generating accurate messages in a couple of seconds, but on the other hand, it is necessary to control all these texts that spread around the Internet. AI makes quick and accurate text outputs, but, as we've already said, it doesn't know what the outputs mean. These models predict the next most likely word in a sentence. Therefore, they do not know whether what they are writing is correct or false.

What is true is that, as with any technology, we must understand how to use it. Surely, the solution is not to outlaw or demonize it. On the contrary, it is to comprehend how these tools can improve our capacity and talent while ensuring that they do not impede our development in various disciplines and fields.

← Previous

Next →