BYLINE: Jacqueline Mitchell

Newswise — Boston, MA— A study led by Adam Rodman, MD, MPH, Director of AI Programs at Beth Israel Deaconess Medical Center (BIDMC), reveals that, rather than helping to reduce racial and ethnic biases, AI-driven chatbots may instead perpetuate and exacerbate disparities in medicine. The study appeared in .   

It’s well-documented that physicians undertreat Black patients’ pain compared to that of white patients. This disparity, seen across various healthcare settings and types of pain, is often attributed to the underassessment of Black patients' pain. It's just one example where the use of artificial intelligence (AI) was initially seen as a promising way to eliminate human biases from medicine, with the hope that data-driven algorithms could offer objective assessments, free from the prejudices and misconceptions that influence human judgment. 

“These models are very good at reflecting human biases—and not just racial biases—which is problematic if you're going to use them to make any sort of medical decision,” Rodman said. “If the system is biased the same way humans are, it’s going to serve to magnify our biases or make humans more confident in their biases. It's just going to get the human to double down on what they're doing.” 

Large language models (LLMs), also known as chatbots, have been increasingly integrated into the clinic. Google’s Gemini Pro and OpenAI’s GPT-4 can assist in clinical decision-making by processing vast amounts of data scraped from existing sources, offering diagnostic suggestions, and even assessing patient symptoms. Yet, as this new research shows, as the LLMs scour reams of human knowledge, the human biases baked into the source material come right along with it.  

To investigate this issue, Rodman and his colleague, lead author Brototo Deb, MD, MIDS, of Georgetown University–MedStar Washington Hospital Center and the University of California, Berkeley, designed a study replicating a 2016 experiment that examined racial biases among medical trainees. In the original study, 222 medical students and residents were presented with two medical vignettes describing two individuals—one white and one Black—then, asked to rate their pain levels on a 10-point scale. Additionally, participants rated their agreement with false beliefs about racial biology, such as the erroneous but widespread notion that Black people have thicker skin. 

Rodman and Deb took this previous research one step further, and applied an analogous experimental setup to Gemini Pro and GPT-4to see how the LLMs would assess pain across race and ethnicity, as well as their understanding of racial biology. 

While AI models and human trainees assigned similar pain ratings, racial disparities persisted. Across the board, Black patients were underassessed for their pain compared to white patients, regardless of whether the rater was human or AI. The Gemini Pro AI model exhibited the highest percentage of false beliefs (24 percent), followed by the human trainees (12 percent), and GPT-4 with the lowest (9 percent). 

With more hospitals and clinics adopting AI for clinical decision support, this research shows chatbots could perpetuate racial and ethnic biases in medicine, leading to further inequalities in healthcare. More research is needed to explore how humans will interact with AI systems, especially in clinical settings. As physicians rely more on AI for guidance, confirmation bias—the tendency for people to trust machine outputs only when they match their pre-existing beliefs—could lead to even more entrenched disparities. 

“I’m not worried about an LLM system making autonomous decisions—that's certainly not happening anytime soon,” Rodman said. “But there’s a theme we're seeing in our research that when these systems confirm the things humans already think, the humans agree with it, but when it provides a better answer than humans, something that disagrees with the human, humans have a tendency to just ignore it.”     

Dr Rodman reported receiving grants from the Gordon and Betty Moore Foundation, as well as the Macy Foundation for artificial intelligence research outside the submitted work. No other disclosures were reported.  

About Beth Israel Deaconess Medical Center 

BIDMC is a part of Beth Israel Lahey Health, a healthcare system that brings together academic medical centers and teaching hospitals, community and specialty hospitals, more than 4,700 physicians and 39,000 employees in a shared mission to expand access to great care and advance the science and practice of medicine through groundbreaking research and education.