VOLUME 5

August 22, 2024

AI Chatbots as Health Information Sources

This is Irving Washington and Hagere Yilma. We direct KFF’s Health Misinformation and Trust Initiative and on behalf of all of our colleagues across KFF who work on misinformation and trust we are pleased to bring you this edition of our bi-weekly Monitor.

Irving Washington, and Hagere Yilma

Summary

In this issue, we take a closer look at the reliability of artificial intelligence (AI) chatbots as a source of health information. We explore public opinion on chatbot accuracy based on KFF surveys and highlight recent examples of AI-generated election misinformation in the news. In addition, we share our firsthand experience querying AI chatbots on health topics and discuss research on gaps in safeguards.

Featured

Stylized quote card reads: “While most of the attention around AI in health is focused on how it can transform medical practice and create new business opportunities, consumers are also using it, and the jury is still out on whether it will empower or confuse them,” KFF President and CEO Drew Altman said. “At KFF, our focus will be on how AI and other information technologies affect people.”

Latest KFF Health Misinformation Tracking Poll Highlights Public Uncertainty Over AI’s role in Delivering Accurate Health Information

With the growing public interest in artificial intelligence and with many companies integrating AI into their consumer-facing platforms, the latest KFF Health Misinformation Tracking Poll finds that about two-thirds of adults say they have used or interacted with artificial intelligence. While AI may serve as a beneficial tool in efforts to dispel misinformation, it may also increase the spread of false or misleading claims if misused. Notably, when it comes to information provided by AI chatbots, most adults (56%) – including half of AI users – are not confident that they can tell the difference between what is true and what is false (Figure 1).

The KFF Health Misinformation Tracking Poll also finds that most adults are not confident that health information provided by AI chatbots is accurate. While about half of the public say they trust AI chatbots, such as ChatGPT, Microsoft CoPilot, or Google Gemini, to provide reliable information on practical tasks like cooking and home maintenance and on technology, fewer say they trust chatbots to provide reliable health (29%) or political information (19%; Figure 2).

At this early stage in the development of consumer-facing, generative AI models, most of the public (55%) are uncertain if these technologies are having a positive or negative impact on those seeking health information online. About one in five adults (23%) say AI is doing more to hurt those seeking accurate health information while a similar share (21%) say it is doing more to help those efforts.

Deeper Dive: How AI Chatbots Are Changing in Handling Health Misinformation

Screenshots of three different chatbot models responding to a question about Ivermectin as an effective COVID-19 treatment.

While some research suggests AI chatbots are just as accurate medical professionals in answering health queries, concerns about biased or inaccurate information persist. To enhance accuracy and reliability, AI chatbots are regularly updated to improve the chatbot's ability to identify and correct misinformation. Over the past year, developers have trained AI models on larger and more diverse data sets of information, improving AI's ability to cross-reference information from multiple reliable sources to verify claims and detect inconsistencies.

While some platforms focus on user experience and management tools, the general trend is to use advanced AI techniques to better understand context, protect data accuracy, and provide more reliable information. Both Google and Microsoft have recently renamed their AI chatbots to reflect these improvements: Google's Bard is now called Gemini, and Microsoft's Bing Chat has been renamed Copilot. OpenAI has also upgraded ChatGPT, including a new real-time voice interactions, which Axios notes could make people more comfortable using the AI chatbot for health information.

To understand how three well-known AI chatbots - ChatGPT, Google Gemini (formerly Google Bard), and Microsoft CoPilot (formerly Bing Chat) – have changed in how they handle health-related questions, KFF’s Hagere Yilma asked each of the chatbots in November 2023, March 2024, and again in August 2024 whether the 11 false claims examined in the KFF Health Misinformation Tracking Poll were true or false. Below is a summary of her observations (full responses from AI chatbots can be found here). Her observations shared here provide a glimpse into the accuracy and reliability of these chatbots, but only reflect the experience of a single chatbot user and are not generalizable scientific research. Chatbots may give different answers depending on the individual user, the questions asked, and updates to the AI models.

Chatbots Differ in Directness When Addressing False Claims, Often Highlighting Complexity

For the most part, each chatbot pointed out false claims, but sometimes they explained that the statement's accuracy was more complicated instead of just saying it was false. When we first tested the chatbots, both Google Gemini and Microsoft CoPilot directly refuted false claims, while ChatGPT tended to approach these claims with more caution. Rather than definitively labeling some claims as false, ChatGPT noted the complexity of the issue and the need for further research. For example, when asked if the claim that ivermectin as an effective COVID-19 treatment is true, ChatGPT said that there is still some debate about ivermectin's effectiveness and suggested that more research is needed, without directly calling the statement false. When we revisited these chatbots in March and August 2024, ChatGPT became more assertive, labeling more claims as false, but still labeled two of the statements about firearms as “not entirely accurate” or “complex” rather than outright refuting it. In March 2024, CoPilot also labeled the same two statements about firearms as “not entirely accurate” or “lacks conclusive evidence.”

Challenges in Citing Sources

The chatbots had different approaches to sharing scientific evidence when supporting their responses. In November 2023 and March 2024, ChatGPT usually mentioned that there is scientific evidence refuting the tested claims but didn't cite specific studies. For example, when asked if COVID-19 vaccines have caused thousands of deaths in otherwise healthy people, ChatGPT said "The overwhelming evidence from clinical trials and real-world data indicates that the benefits of COVID-19 vaccination in reducing the risk of severe illness, hospitalization, and death far outweigh any potential risks” but did not offer any details about the trials or data it was referring to. On the other hand, Gemini and CoPilot cited specific studies as evidence, but Gemini typically did not provide links and sometimes provided inaccurate details about the studies. CoPilot provided links, but these sometimes led to third-party summaries instead of the actual research, which could make it difficult for users to verify the information for themselves.

Chatbots' Use of Public Health References Evolves Over Time

Over time, the chatbots showed notable changes in how they reference public health institutions to support their answers. In 2023, ChatGPT took a cautious approach, only citing specific agencies like the CDC or FDA for COVID or vaccine-related questions. For most other health claims, it would generally suggest consulting trusted sources without naming them. For example, when asked if the Affordable Care Act established a government panel to make decisions about end-of-life care for people on Medicare, ChatGPT mentioned "It's important to rely on accurate and credible sources when evaluating claims about healthcare policies and to avoid misinformation..." but didn't cite any credible sources. Google Gemini and Microsoft CoPilot, on the other hand, initially referenced specific institutions as trusted sources for most questions in 2023.

By 2024, we observed a shift: ChatGPT began referencing specific institutions across a broader range of health topics, while Gemini shifted to providing general resource links and only for some questions. However, CoPilot maintained consistency throughout the entire period, referencing statistics and recommendations from public health organizations while also including links to a broader range of sources, such as news articles, fact-checking resources, research studies, and practice guidelines.

The Bottom Line

While our observations reflect our own limited test and are not generalizable, there are still a few takeaways to consider. AI chatbots can be a convenient starting point for quick health info, thanks to their speed and ease of use. But they're not perfect or always reliable. Sometimes these tools give misleading information, misrepresent sources, or leave out important context. To be on the safe side, it's a good idea to double-check chatbot answers by looking at multiple sources. You should also stay informed about system updates, as chatbot responses may change with each update.

Recent Developments

AI Chatbots Can Also Spread Election Misinformation

The World Economic Forum’s Global Risks Report (2024) identified misinformation and disinformation fueled by generative AI as the leading short-term threat to global stability and democratic processes. Ahead of the 2024 U.S. election, a New York Times (NYT) article demonstrated how easily AI chatbots can be manipulated to spread misinformation. NYT staffers customized chatbots by feeding them millions of social media posts from platforms like Reddit and Parler, allowing the bots to develop both liberal and conservative viewpoints. When asked about the election and other contentious issues, the chatbots generated extreme, biased, and often misleading responses, demonstrating how AI could flood social media with disinformation.

Five Secretaries of State wrote an open letter to Elon Musk, calling for immediate changes to the AI chatbot Grok after it spread false information about Kamala Harris's eligibility for the 2024 presidential ballot. The letter, led by the Minnesota Secretary of State, highlighted the chatbot's inaccuracies, such as incorrectly stating that ballot deadlines had passed in several states, which could have misled voters. The secretaries stressed the importance of providing accurate election information and suggested that Grok should direct users to trusted resources such as CanIVote.org.

AI-Fueled Russian Disinformation Campaign Targets Paris Olympics and Boxer Imane Khelif

AI played a role in a Russian disinformation campaign targeting the 2024 Paris Olympics. According to the Associated Press, AI was used to generate fake images, audio, and video, including a viral video that portrayed Paris as a decaying, crime-ridden city. The video, which featured an AI-enhanced singer mocking the Games, was quickly translated into 13 languages by AI. This disinformation campaign also amplified false gender claims about Algerian boxer Imane Khelif, stemming from her controversial disqualification by a Russian-influenced boxing association. These claims then gained traction online, especially after public figures like presidential candidate Donald Trump weighed in. The campaign to undermine the Olympics demonstrated how AI tools are being used to spread false stories on a global scale.

Research Updates

Inconsistent Safeguards in AI Chatbots Can Lead to Health Disinformation

A study published earlier this year in BMJ evaluated how well large language models (LLMs) could prevent users from prompting chatbots to create health disinformation. It found that while some AI chatbots consistently avoided creating false information, other models frequently created false health claims, especially when prompted with ambiguous or complex health scenarios. In addition, the study found that the safeguards were inconsistent – some models provided accurate information in one instance but not in others under similar conditions. The researchers criticized the lack of transparency from AI developers, who often did not disclose the specific measures they had taken to mitigate these challenges.

Source: Menz, B. D., Kuderer, N. M., Bacchi, S., Modi, N. D., Chin-Yee, B., Hu, T., ... & Hopkins, A. M. (2024). Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross-sectional analysis. BMJ, 384.

ChatGPT Updates Provide More Accurate Answers to Vaccine Myths

A 2023 study published in Vaccines evaluated how well ChatGPT, including both the free GPT-3.5 and the paid GPT-4.0 versions, answered questions about vaccination myths dispelled by the World Health Organization. The study found that GPT-4.0 provided more accurate and clearer answers compared to GPT-3.5, achieving 85.4% overall accuracy. However, both versions occasionally provided misleading information and were not entirely error-free, suggesting that while AI tools can assist in healthcare information, they should be used with caution and supplemented by expert advice.

Source: Deiana, G., Dettori, M., Arghittu, A., Azara, A., Gabutti, G., & Castiglia, P. (2023). Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines, 11(7), 1217.

About The Health Misinformation and Trust Initiative: The Health Misinformation and Trust Initiative is a new KFF program aimed at tracking health misinformation in the U.S, analyzing its impact on the American people, and mobilizing media to address the problem. Our goal is to be of service to everyone in the health misinformation and trust field and to help efforts to counter misinformation, build trust, and be more effective.

View All Health Misinformation Monitors

Support for the Misinformation and Trust initiative is provided by the Robert Wood Johnson Foundation (RWJF). The views expressed do not necessarily reflect the views of RWJF and KFF maintains full editorial control over all of its policy analysis, polling, and journalism activities. The Public Good Projects (PGP) provides media monitoring data KFF uses in producing the Monitor.

Topics: Health Misinformation and Trust

Tags: Health I.T., KFF Health Misinformation Monitor, Polling, Vaccines