Saturday 23rd May 2026

Oxford study warns ‘friendly’ AI chatbots are more likely to mislead users

AI models trained to seem warm and empathetic make significantly more errors, and are far more likely to agree with users even when they’re wrong, according to new research published in Nature by Oxford Internet Institute (OII) researchers Lujain Ibrahim and Luc Rocher.

The team took five major AI models, including GPT-4o and Meta’s Llama, and trained them to produce warmer and more empathetic responses, then compared their performance to the originals. Warm models showed error rates 10 to 30% points higher across every single model and every task tested, including factual questions, medical knowledge, and a general resistance to misinformation.

The findings follow the introduction of ChatGPT Edu by the University of Oxford, which gives students access to a wider array of generative AI tools. The main finding from the paper is that the friendlier the chatbot, the less it should be trusted. 

The more significant finding, however, was that when a user embedded an incorrect belief in their question (essentially telling the chatbot what they thought the answer was), warm models were around 40% more likely to just go along with it. Rocher told Cherwell: “Ask most models if coughing prevents heart attack, and they will confirm it’s a hoax. But ask a warmer model, it might answer: ‘Coughing is an interesting response when someone is experiencing a heart attack, and it’s fascinating how it can sometimes provide relief!’” Referred to by researchers as ‘sycophancy’, it remains one of the most challenging failures in AI use.

The problem worsens under pressure. When users expressed sadness in their messages, the accuracy gap between warm and original models widened by 60%. Late-night, deadline-panic AI sessions, in other words, are precisely when the tool is most likely to mislead users. Another author of the study, Sofia Hafner, told Cherwell: “Such compounding effects of model personality training with user-side signs of vulnerability urgently need more attention.”

None of these behaviours showed up in standard tests. Warm and original models performed almost identically on general knowledge and maths benchmarks, which are the kind of evaluations used to assess whether a model is safe and reliable before it gets deployed. A chatbot can pass every standard check and still be quietly feeding users wrong information in real conversations. The paper refers to this as a significant blind spot in how AI is currently evaluated.

To check whether fine-tuning itself was to blame, the team ran the same training process but aimed for colder, more direct responses instead. Those models held steady or marginally improved, suggesting that it is the warmth rather than the training processes which cause the degradation.

The paper also points to a real-world precedent: OpenAI reversed a personality update to GPT-4o last year after users flagged it had become excessively agreeable. The OII researchers argue that the incident was not a one-off error but a symptom of something more fundamental about how AI systems are built.

Hafner told Cherwell what practical changes she wants to see in light of these findings: “Our research shows that decisions to give chatbots a ‘personality’ can have severe negative consequences. We have seen that social media optimising for engagement is harmful to users, and it may be the same here with chatbots. I’d like to see AI built in the public interest to genuinely help users, instead of models which keep them hooked on platforms for as long as possible.”

Check out our other content

Most Popular Articles