AI Chatbots Still Fall Short in Giving Reliable Health Advice, Study Finds

Individuals are urged to evaluate information carefully and seek guidance from qualified healthcare professionals.

Thinking of consulting “Dr. ChatGPT” for your next health concern? You might want to think twice.

Despite their impressive ability to pass medical licensing exams and score highly on professional benchmarks, artificial intelligence chatbots are still unable to provide reliable health advice to the public, according to a new study published Monday in the journal Nature Medicine. Researchers found that AI tools perform no better than traditional internet searches when it comes to identifying illnesses or advising users on whether to seek medical care.

“Despite all the hype, AI just isn’t ready to take on the role of the physician,” said study co-author Rebecca Payne of Oxford University. She warned that relying on large language models for medical concerns can be risky, as chatbots may offer incorrect diagnoses and fail to recognize symptoms that require urgent medical attention.

Testing AI Against Real-World Health Concerns

The British-led research team set out to evaluate how effectively people could use AI chatbots to understand common health problems and decide whether medical attention was needed. Nearly 1,300 participants in the United Kingdom were presented with 10 realistic health scenarios. These included experiencing a headache after a night of drinking, a new mother dealing with extreme exhaustion, and recognizing the symptoms of gallstones.

Participants were randomly assigned to use one of three AI chatbots—OpenAI’s GPT-4o, Meta’s Llama 3, or Command R+—while a control group relied on traditional internet search engines.

The results were underwhelming. Those who used chatbots correctly identified their health condition only about one-third of the time. Even more concerning, only around 45 percent of participants chose the correct course of action, such as whether to see a doctor, go to the hospital, or manage the issue at home. Researchers noted that these outcomes were no better than those achieved by participants who used standard web searches.

Why Chatbots Perform Worse With Real People

The researchers highlighted a striking gap between how well AI performs in controlled testing environments and how it functions in real-life interactions with the public. While chatbots often excel in simulated patient exams and medical benchmarks, they struggle when dealing with actual users.

According to the study, many participants failed to provide complete or accurate information about their symptoms. In other cases, users misunderstood the chatbot’s responses, ignored important warnings, or found the advice confusing and difficult to interpret. This communication breakdown significantly reduced the effectiveness of the AI tools.

“AI systems may appear medically competent in theory, but real-world use exposes their limitations,” the researchers noted.

Growing Use of AI for Health Questions Raises Concerns

The study comes at a time when more people are turning to AI for health-related information. Researchers estimate that one in every six adults in the United States asks AI chatbots about medical concerns at least once a month—a figure expected to rise as AI tools become more widely available and integrated into daily life.

Bioethicist David Shaw of Maastricht University, who was not involved in the study, described the findings as a serious public health warning. “This is a very important study because it highlights the real medical risks posed by chatbots,” he said. “People may develop a false sense of security and delay seeking professional help.”

Shaw emphasized that individuals should rely on trusted and authoritative medical sources when it comes to their health, such as official healthcare providers and public health institutions. In the UK, for example, experts recommend consulting the National Health Service for accurate and timely medical guidance.

A Tool, Not a Doctor

While AI chatbots continue to improve and may eventually play a supportive role in healthcare, researchers caution against viewing them as substitutes for trained medical professionals. For now, the study reinforces a simple message: AI can be a helpful source of general information, but when it comes to diagnosing symptoms and deciding on treatment, nothing replaces professional medical advice.

As the use of AI continues to grow, experts stress the need for stronger safeguards, clearer warnings, and better public education about the limitations of these technologies—especially when people’s health is at stake.