ChatGPT-4 generates orthopedic discharge documents faster than humans maintaining comparable quality: a pilot study of 6 cases

Authors

  • Guillermo Sánchez-Rosenberg Department of Orthopedic and Trauma Surgery, University Hospital Basel, Switzerland
  • Martin Magnéli Karolinska Institute, Department of Clinical Sciences at Danderyd Hospital, Stockholm; Sweden
  • Niklas Barle Karolinska Institute, Department of Clinical Sciences at Danderyd Hospital, Stockholm; Sweden
  • Michael G Kontakis Department of Surgical Sciences, Orthopedics, Uppsala University Hospital, Uppsala, Sweden
  • Andreas Marc Müller Department of Orthopedic and Trauma Surgery, University Hospital Basel, Switzerland
  • Matthias Wittauer Department of Orthopedic and Trauma Surgery, University Hospital Basel, Switzerland
  • Max Gordon Karolinska Institute, Department of Clinical Sciences at Danderyd Hospital, Stockholm; Sweden https://orcid.org/0000-0002-8080-5815
  • Cyrus Brodén Department of Surgical Sciences, Orthopedics, Uppsala University Hospital, Uppsala, Sweden

DOI:

https://doi.org/10.2340/17453674.2024.40182

Keywords:

Administrative Task, AI in orthopaedics, Artificial Intelligence, ChatGPT, Discharge Documents, Large Language models, Orthopaedic Surgery, Physician Burnout

Abstract

Background and purpose: Large language models like ChatGPT-4 have emerged. They hold the potential to reduce the administrative burden by generating everyday clinical documents, thus allowing the physician to spend more time with the patient. We aimed to assess both the quality and efficiency of discharge documents generated by ChatGPT-4 in comparison with those produced by physicians.
Patients and methods: To emulate real-world situations, the health records of 6 fictional orthopedic cases were created. Discharge documents for each case were generated by a junior attending orthopedic surgeon and an advanced orthopedic resident. ChatGPT-4 was then prompted to generate the discharge documents using the same health record information. The quality assessment was performed by an expert panel (n = 15) blinded to the source of the documents. As secondary outcome, the time required to generate the documents was compared, logging the duration of the creation of the discharge documents by the physician and by ChatGPT-4.
Results: Overall, both ChatGPT-4 and physician-generated notes were comparable in quality. Notably, ChatGPT-4 generated discharge documents 10 times faster than the traditional method. 4 events of hallucinations were found in the ChatGPT-4-generated content, compared with 6 events in the human/physician produced notes.
Conclusion: ChatGPT-4 creates orthopedic discharge notes faster than physicians, with comparable quality. This shows it has great potential for making these documents more efficient in orthopedic care. ChatGPT-4 has the potential to significantly reduce the administrative burden on healthcare professionals.

 

Downloads

Download data is not yet available.

References

Patel R S, Bachu R, Adikey A, Malik M, Shah M. Factors related to physician burnout and its consequences: a review. Behav Sci Basel Switz 2018; 8(11): 98. doi: 10.3390/bs8110098. DOI: https://doi.org/10.3390/bs8110098

Wright A A, Katz I T. Beyond burnout: redesigning care to restore meaning and sanity for physicians. N Engl J Med 2018; 378(4): 309-11. doi: 10.1056/NEJMp1716845. DOI: https://doi.org/10.1056/NEJMp1716845

Woolhandler S, Campbell T, Himmelstein D U. Costs of health care administration in the United States and Canada. N Engl J Med 2003; 349(8): 768-75. doi: 10.1056/NEJMsa022033. DOI: https://doi.org/10.1056/NEJMsa022033

Nadeau S, Cusick J, Shepherd M. Excess administrative costs burden the U.S. health care system. [Published online November 2, 2021.] Available from: https://www.americanprogress.org/article/excess-administrative-costs-burden-u-s-health-care-system.

Kung T H, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2(2): e0000198. doi: 10.1371/journal.pdig.0000198. DOI: https://doi.org/10.1371/journal.pdig.0000198

Gilson A, Safranek C W, Huang T, Socrates V, Chi L, Taylor R A, et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023; 9: e45312. doi: 10.2196/45312. DOI: https://doi.org/10.2196/45312

Giannos P, Delardas O. Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ 2023; 9: e47737. doi: 10.2196/47737 DOI: https://doi.org/10.2196/47737

Massey P A, Montgomery C, Zhang A S. Comparison of ChatGPT-3.5, ChatGPT-4, and orthopedic resident performance on orthopedic assessment examinations. J Am Acad Orthop Surg 2023; 31(23): 1173-79. doi: 10.5435/JAAOS-D-23-00396. DOI: https://doi.org/10.5435/JAAOS-D-23-00396

Hurley E T, Crook B S, Lorentz S G, Danilkowicz R M, Lau B C, Taylor D C, et al. Evaluation high-quality of information from ChatGPT (artificial intelligence-large language model) artificial intelligence on shoulder stabilization surgery. Arthroscopy 2023:S0749-8063(23)00642-4. doi: 10.1016/j.arthro.2023.07.048. DOI: https://doi.org/10.1016/j.arthro.2023.07.048

Mika A P, Martin J R, Engstrom S M, Polkowski G G, Wilson J M. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. J Bone Joint Surg Am 2023; 105(19): 1519-26. doi: 10.2106/JBJS.23.00209. DOI: https://doi.org/10.2106/JBJS.23.00209

Ayers J W, Poliak A, Dredze M, Leas E C, Zhu Z, Kelley J B, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023; 183(6): 589-96. doi: 10.1001/jamainternmed.2023.1838. DOI: https://doi.org/10.1001/jamainternmed.2023.1838

O’Connor S. Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ Pract 2023; 66:103537. doi: 10.1016/j.nepr.2022.103537. DOI: https://doi.org/10.1016/j.nepr.2022.103537

Ollivier M, Pareek A, Dahmen J, Kayaalp M E, Winkler P W, Hirschmann M T, et al. A deeper dive into ChatGPT: history, use and future perspectives for orthopedic research. Knee Surg Sports Traumatol Arthrosc 2023; 31(4): 1190-2. doi: 10.1007/s00167-023-07372-5. DOI: https://doi.org/10.1007/s00167-023-07372-5

Liu X, Rivera S C, Moher D, Calvert M J, Denniston A K, SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ 2020; 370: m3164. doi: 10.1136/bmj.m3164. DOI: https://doi.org/10.1136/bmj.m3164

Singhal K, Azizi S, Tu T, Mahdavi S S, Wei J, Chung H W, et al. Large language models encode clinical knowledge. Nature 2023; 620(7972): 172-180. doi: 10.1038/s41586-023-06291-2. DOI: https://doi.org/10.1038/s41586-023-06291-2

Ali S R, Dobbs T D, Hutchings H A, Whitaker I S. Using ChatGPT to write patient clinic letters. Lancet Digit Health 2023; 5(4): e179-e181. doi: 10.1016/S2589-7500(23)00048-1. DOI: https://doi.org/10.1016/S2589-7500(23)00048-1

Wimsett J, Harper A, Jones P. Review article: Components of a good quality discharge summary: a systematic review. Emerg Med Australas 2014; 26(5): 430-8. doi: 10.1111/1742-6723.12285. DOI: https://doi.org/10.1111/1742-6723.12285

Greer R C, Liu Y, Crews D C, Jaar B G, Rabb H, Boulware L E. Hospital discharge communications during care transitions for patients with acute kidney injury: a cross-sectional study. BMC Health Serv Res 2016; 16(1): 449. doi: 10.1186/s12913-016-1697-7. DOI: https://doi.org/10.1186/s12913-016-1697-7

Kripalani S, LeFevre F, Phillips C O, Williams M V, Basaviah P, Baker D W. Deficits in communication and information transfer between hospital-based and primary care physicians: implications for patient safety and continuity of care. JAMA 2007; 297(8): 831-41. doi: 10.1001/jama.297.8.831. DOI: https://doi.org/10.1001/jama.297.8.831

Brameier D T, Alnasser AA, Carnino J M, Bhashyam A R, von Keudell A G, Weaver M J. Artificial intelligence in orthopedic surgery: can a large language model “write” a believable orthopedic journal article? J Bone Joint Surg Am 2023; 105(17): 1388-92. doi: 10.2106/JBJS.23.00473. DOI: https://doi.org/10.2106/JBJS.23.00473

West C P, Dyrbye L N, Shanafelt T D. Physician burnout: contributors, consequences and solutions. J Intern Med 2018; 283(6): 516-29. doi: 10.1111/joim.12752. DOI: https://doi.org/10.1111/joim.12752

Rotenstein L S, Torre M, Ramos M A, Rosales R C, Guille C. Sen S, et al. Prevalence of burnout among physicians: a systematic review. JAMA 2018; 320(11): 1131-1150. doi: 10.1001/jama.2018.12777. DOI: https://doi.org/10.1001/jama.2018.12777

Panagioti M, Panagopoulou E, Bower P, Lewith G, Kontopantelis E, Chew-Graham C, et al. Controlled interventions to reduce burnout in physicians: a systematic review and meta-analysis. JAMA Intern Med 2017; 177(2): 195-205. doi: 10.1001/jamainternmed.2016.7674. DOI: https://doi.org/10.1001/jamainternmed.2016.7674

Patel S B, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health 2023; 5(3): e107-e108. doi: 10.1016/S2589-7500(23)00021-3. DOI: https://doi.org/10.1016/S2589-7500(23)00021-3

Kanter G P, Packel E A. Health care privacy risks of AI chatbots. JAMA 2023; 330(4): 311-12. doi: 10.1001/jama.2023.9618. DOI: https://doi.org/10.1001/jama.2023.9618

Published

2024-03-21

How to Cite

Sánchez-Rosenberg , G., Magnéli, M., Barle, N., Kontakis, M. G., Müller, A. M., Wittauer, M., Gordon, M., & Brodén, C. (2024). ChatGPT-4 generates orthopedic discharge documents faster than humans maintaining comparable quality: a pilot study of 6 cases. Acta Orthopaedica, 95, 152–156. https://doi.org/10.2340/17453674.2024.40182