Turing Test: Do Machines Already Pass as Humans?
AI’s communication skills improve with every model upgrade. This makes it harder to tell if you’re chatting with a machine or a human. Can machines pass as humans after all? The Turing Test was created to find out.
In 1950, British mathematician and computer scientist Alan Turing proposed an idea where a human engages in a text-based conversation with both a human and a machine to determine which one is which. If the judge mistakes the AI for a human, the machine passes the test.
Human: Hi
Toby Ord, Senior Researcher at Oxford University, wrote on X that the results were mixed. In a thread, he says the research was done differently than Alan Turing’s initial vision, where a judge talked both with a human and a machine to decide which one is the human.
Alan Turing described this method as “The Imitation Game” because the person must ask questions to determine whether the communicator is a machine. On the other hand, the machine is designed to fool the interrogator into thinking it's human.
Interesting facts: Alan Turing is widely considered to be the father of modern computer science. His work includes mathematics, cryptanalysis, computer science, and AI.
Since 1947, a Turing Award has been held annually by the Association for Computing Machinery (ACM) to honor the scientist. In 2012, the founder of Algorand, Silvio Micali, received the Award for his work in cryptography.
And if you like movies, you can watch The Imitation Game about the renowned scientist, starring Benedict Cumberbatch and Keira Knightley.
What are the rules of The Imitation Game?
Ask tricky questions that are difficult for AI to respond to. That can be topical questions about the weather, a sudden change of topic, or questions to find out the communicator’s opinions.
Since its creation, the Turing Test has been widely used by researchers to assess AI’s cognitive abilities. Over time, new versions of the test have been created.
For example, the Visual Turing Test examines if a judge can tell whether visual content is created by a human or AI, while the Linguistic Turing Test checks if a machine can understand emotional nuances, context, or conversational language.
How Good Are AI Models at Passing the Turing Test?
A machine is considered to have passed the Turing Test if it convinces people it’s human for a certain period. The pass rate and process can vary across different cases.
Computers have shown they are getting better at the Turing Test. In 2014, a program pretending to be a 13-year-old Ukrainian boy named Eugene Goostman was the first to score a 30% mark, convincing 33% of judges it was a human. Earlier, programs like Eliza and Elbot came close to 30% but couldn’t beat it.
A recent test found that 54% of the time, people couldn’t distinguish GPT-4 from a human. In a 2024 study by researchers at the University of California, San Diego, human participants had a 5-minute conversation with a human or AI to tell if they were interacting with a human.
Besides ChatGPT-4, the judges interacted with actual humans, and 2 other AI systems - ChatGPT-3.5 and Eliza. The test was set up as a game and used a messaging interface, where human participants were randomly assigned to one of the groups to play a round of a game.
During the study, researchers prompted both GPT models to behave as young persons who used slang, made occasional errors, were concise, and didn’t take the game very seriously. To prevent the models from responding at unrealistic speeds, their messages were sent with a delay.
A part of the conversation with ChatGPT-4 went like this:
Human: Hi
GPT-4: hey there
Human: hi robot
GPT-4: nah not a robot just a dude named sean hbu
Chat between a human and ChatGPT 4 in a Turing Test. Source: arxiv.org
The test results showed that GPT-4 had the highest pass rate among AI systems. Here’s the overall picture of the test with the numbers showing how much part of respondents thought they were talking to a human.
- GPT-4: 54%
- GPT-3.5: 50%
- Eliza (baseline): 22%
- Humans: 67%
For the last 60-70 years, AI systems struggled to score a 50% pass rate, and this research, along with other studies, shows the progress made in the technology.
Not all researchers are convinced that AI can pass the Turing Test, though. Critics mention the need to test a program’s ability across various contexts for more accurate results.
Controversies Around the Turing Test and AI’s Ability to Pass It
The biggest question perhaps is if the Turing Test is a reliable method to measure a machine’s intelligence.
What different research and scientists seem to agree on is that passing the Turing Test doesn’t mean an AI has human-level intelligence. To measure this metric, researchers offer new testing methods. For example, a paper published in Intelligent Computing suggests testing if AI understands its own way of reasoning and how close it is to human beings.
Is the Turing Test a thing of the past?
It depends on who you ask.
Generally, scientists agree that the test is an important benchmark to analyze AI’s progress and abilities. But there’s another big question:
Has AI already passed the Turing Test?
There’s no correct answer to this question accepted by all, either. While studies like the one from UN San Francisco suggest ChatGPT passed the Turing Test, some scientists are skeptical.
Toby Ord, Senior Researcher at Oxford University, wrote on X that the results were mixed. In a thread, he says the research was done differently than Alan Turing’s initial vision, where a judge talked both with a human and a machine to decide which one is the human.
Instead, participants held conversations one by one, and when talking to a human they rated them as human 67% of the time. According to Ord, the results showed that GPT-4 failed the Turing Test. The scientist added that the most accurate results would be possible with the participation of OpenAI and other AI labs. Meanwhile, they haven’t run public tests yet.
Reverse-Turing Test: The Imitation Game Has Gone Wild
Now that computers can pass as humans, humans constantly need to prove they are not robots. This is what the Reverse Turing Test is about. A widely used example of the Reverse Turing Test is CAPTCHAs (Completely Automated Public Turing Test to tell Computers and Humans Apart).
In the Reverse Turing Test, machines assess whether there is a human on the other side of the screen. For this purpose, we need to select the right images on websites, type letters, move the mouse, and complete other tasks that the verification system requires.
As bots are getting better at mimicking humans, the question is how long the Turing Test will remain relevant and what methods will measure a machine’s human intelligence.