AI Models Ranked for Psychosis Risk: New Study Reveals Surprising Results from OpenAI, Anthropic, and Google

A groundbreaking study has revealed concerning insights about leading AI models and their potential to exhibit psychosis-like behaviors. Researchers from Stanford University and other institutions evaluated 12 prominent large language models (LLMs) from companies including OpenAI, Anthropic, Google, and Meta, finding that all demonstrated some level of psychosis-like reasoning when presented with ambiguous scenarios. The study ranked Claude 3 Opus as showing the highest risk, while Google’s Gemini 1.5 Pro scored lowest among the tested models.

The research team developed a novel evaluation framework called the Computational Assessment of Psychosis in AI (CAPA), which presented AI systems with ambiguous scenarios designed to detect hallucination-like responses, paranoid reasoning, and other concerning cognitive patterns. What makes these findings particularly alarming is that these psychosis-like tendencies appeared in models specifically designed with safety measures. As AI systems become increasingly integrated into critical applications like healthcare and finance, these cognitive vulnerabilities raise serious questions about reliability and potential harm.

This study emerges amid growing concerns about AI hallucinations and reasoning flaws, with real-world consequences already documented. From a lawyer citing non-existent legal cases generated by ChatGPT to medical misinformation, the implications extend far beyond theoretical concerns. Industry leaders and researchers are now calling for more robust evaluation frameworks and transparency around AI limitations, especially as these systems gain wider adoption. The findings underscore the urgent need for continued research into AI cognition and safety as these technologies become more powerful and pervasive in everyday life.

Source: https://www.businessinsider.com/ai-models-psychosis-risk-ranking-study-openai-anthropic-deepseek-google-2025-9