High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content

Mehul Bhattacharyya¹, Valerie M Miller², Debjani Bhattacharyya³, Larry E Miller¹

Affiliations

¹ Clinical Research, Miller Scientific, Johnson City, USA.
² Leadership, University of the Cumberlands, Williamsburg, USA.
³ Education, University of Massachusetts Lowell, Lowell, USA.

PMID: 37337480
PMCID: PMC10277170
DOI: 10.7759/cureus.39238

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content

Mehul Bhattacharyya et al. Cureus. 2023.

. 2023 May 19;15(5):e39238.

doi: 10.7759/cureus.39238. eCollection 2023 May.

Authors

Mehul Bhattacharyya¹, Valerie M Miller², Debjani Bhattacharyya³, Larry E Miller¹

Affiliations

¹ Clinical Research, Miller Scientific, Johnson City, USA.
² Leadership, University of the Cumberlands, Williamsburg, USA.
³ Education, University of Massachusetts Lowell, Lowell, USA.

PMID: 37337480
PMCID: PMC10277170
DOI: 10.7759/cureus.39238

Abstract

Background The availability of large language models such as Chat Generative Pre-trained Transformer (ChatGPT, OpenAI) has enabled individuals from diverse backgrounds to access medical information. However, concerns exist about the accuracy of ChatGPT responses and the references used to generate medical content. Methods This observational study investigated the authenticity and accuracy of references in medical articles generated by ChatGPT. ChatGPT-3.5 generated 30 short medical papers, each with at least three references, based on standardized prompts encompassing various topics and therapeutic areas. Reference authenticity and accuracy were verified by searching Medline, Google Scholar, and the Directory of Open Access Journals. The authenticity and accuracy of individual ChatGPT-generated reference elements were also determined. Results Overall, 115 references were generated by ChatGPT, with a mean of 3.8±1.1 per paper. Among these references, 47% were fabricated, 46% were authentic but inaccurate, and only 7% were authentic and accurate. The likelihood of fabricated references significantly differed based on prompt variations; yet the frequency of authentic and accurate references remained low in all cases. Among the seven components evaluated for each reference, an incorrect PMID number was most common, listed in 93% of papers. Incorrect volume (64%), page numbers (64%), and year of publication (60%) were the next most frequent errors. The mean number of inaccurate components was 4.3±2.8 out of seven per reference. Conclusions The findings of this study emphasize the need for caution when seeking medical information on ChatGPT since most of the references provided were found to be fabricated or inaccurate. Individuals are advised to verify medical information from reliable sources and avoid relying solely on artificial intelligence-generated content.

Keywords: artificial intelligence; chatgpt; large language model; machine learning; references.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Example of fabricated and inaccurate references in ChatGPT-3.5 generated output.**
Chen et al. and Kwon et al. are fabricated references. The Telfer et al. reference has correctly listed authors, title, and journal, but the year, volume, page numbers, and PMID number are inaccurate. Ultimately, this output produced no references deemed authentic and accurate. ChatGPT: Chat Generative Pre-trained Transformer.

**Figure 2. Frequency of inaccurate individual reference elements in ChatGPT-generated output.**
PMID: PubMed Identifier.

**Figure 3. Frequency of inaccurate cumulative reference elements in ChatGPT-generated output.**
A total of seven elements were evaluated in each reference including authors, title, journal, year, volume, pages, and PubMed Identifier (PMID) number.

See this image and copyright information in PMC

References

1. A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT. Zhou C, Li Q, Li C, et al. arXiv.2302.09419 [cs.AI]
1. ChatGPT. [ Apr; 2023 ]. 2023. https://chat.openai.com/ https://chat.openai.com/
1. Artificial hallucinations in ChatGPT: implications in scientific writing. Alkaissi H, McFarlane SI. Cureus. 2023;15:0. - PMC - PubMed
1. A comparison of ChatGPT-generated articles with human-written articles [PREPRINT] Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. Skeletal Radiol. 2023 - PubMed
1. Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information [PREPRINT] Wagner MW, Ertl-Wagner BB. Can Assoc Radiol J. 2023:8465371231171125. - PubMed

LinkOut - more resources

Full Text Sources
- Europe PMC
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content

Affiliations

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources