Did Scientists Overestimate AI’s Ability To Think Like Humans?

A latest wave of AI research is trying to handle one of psychology’s oldest questions: whether or not the human mind can be integrated beneath a single principle.

For many years, psychologists have debated a critical query: can the human thoughts be defined through a single, integrated theory, or must approaches like memory, attention, and decision making be studied as separate systems? That query is now being revisited via an unexpected lens. Advances in artificial intelligence are providing researchers a latest way to check what “understanding” truly means.

In July 2025, a study posted in Nature introduced an AI model referred to as “Centaur.” Built on existing large language models and refined with data from psychological experiments, the system became designed to mimic how people think and make decision.

The AI Revolution in Development: Why Outer Loop Agents Are the Next Big Thing

Google is using old news reviews and AI to predict flash floods

MIT Researchers Improve AI Explainability With Concept Bottleneck Models

AI Slop Websites Expose New Industrial-Scale Ad Fraud Operation

As per its creators, Centaur could replicate human-like responses throughout 160 unique cognitive task, spanning areas which includes executive control and choice behavior. The outcomes had been broadly interpreted as a capability step forward, suggesting that AI might start to about a general model of human cognition.

A Challenge to the Centaur Model

A more current study published in National Science Open has cast doubt on these claims. Researchers from Zhejiang University argue that Centaur’s apparent “human cognitive simulation ability” is probably because of overfitting, meaning the model might also have memorized patterns in the training data in preference to understood the task themselves.

To test this idea, the team created numerous experimental setups. In one example, they changed the original multiple choice-preference prompts, which defined specific psychological tasks, with a simple preparation: “Please choose alternative A.” If the model clearly understood the challenge, it have to have decided on choice A on every time. Instead, Centaur persist to provide the same “accurate answers” observed in the original dataset.

This behavior suggests the model was not explaining the meaning of the questions. Instead, it depended on statistical associations to arrive at solutions, much like a student who scores well by spotting patterns with out certainly understanding the material.

Implications for Evaluating AI Systems

The findings emphasize the need for more careful evaluation of large language models. Although those systems are exceedingly effective at fitting patterns in data, their “black-box” design makes them prone to issues which includes hallucinations and misinterpretation. Rigorous and multi-faceted testing out is vital to decide whether a model sincerely demonstrates the abilties it appears to have.

Inspite being described as a “cognitive simulation” system, Centaur’s most notable weakness lies in language comprehension, specifically its ability to grasp the purpose in the back of questions. The study shows that obtaining genuine language understanding might also stay one among the most important challenges in developing general models of cognition.