Skip to main content

Advertisement

Log in

ChatGPT’s adherence to otolaryngology clinical practice guidelines

  • Short Communication
  • Published:
European Archives of Oto-Rhino-Laryngology Aims and scope Submit manuscript

Abstract

Objectives

Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT’s performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines.

Methods

We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model’s consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen’s Kappa was used to measure evaluator agreement, and Cronbach’s alpha assessed the consistency of ChatGPT’s responses.

Results

The study revealed mixed results; 59.7% (43/72) of ChatGPT’s responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model’s responses were consistent in 17/24 (70.8%), with a Cronbach’s alpha value of 0.87, indicating a reasonable consistency across tests.

Conclusions

Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Data availability

Data supporting this study are included within the article and further data will be available upon reasonable request.

References

  1. Pavlik JV (2023) Collaborating with ChatGPT: considering the implications of generative artificial intelligence for journalism and media education. Journalism Mass Commun Educ. https://doi.org/10.1177/10776958221149577

    Article  Google Scholar 

  2. Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr 7(2):pkad015

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kung TH, Cheatham M, Medinilla A, ChatGPT, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198

  4. Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L et al (2024) Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 281:2023–2030

    Article  PubMed  Google Scholar 

  5. Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H (2023) The capability of ChatGPT in predicting and explaining common drug–drug interactions. Cureus 15(3):e36272

    PubMed  PubMed Central  Google Scholar 

  6. Mittermaier M, Raza MM, Kvedar JC (2023) Bias in AI-based models for medical applications: challenges and mitigation strategies. npj Digit Med 6(1):113

    Article  PubMed  PubMed Central  Google Scholar 

  7. Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453

    Article  CAS  PubMed  Google Scholar 

  8. Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J (2023) Ethical considerations of using ChatGPT in health care. J Med Internet Res 25:e48009

    Article  PubMed  PubMed Central  Google Scholar 

  9. American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS). https://www.entnet.org/. Accessed 31 Jan 2023

  10. Zalzal HG, Cheng J, Shah RK (2023) Evaluating the current ability of ChatGPT to assist in professional otolaryngology education. OTO Open 7(4):e94

    Article  PubMed  PubMed Central  Google Scholar 

  11. Graham F (2022) Daily briefing: will ChatGPT kill the essay assignment? Nature. https://doi.org/10.1038/d41586-022-04437-2

    Article  PubMed  PubMed Central  Google Scholar 

  12. O’Connor S (2023) Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ Pract 66:103537

    Article  PubMed  Google Scholar 

  13. Castelvecchi D (2022) Are ChatGPT and AlphaCode going to replace programmers? Nature. https://doi.org/10.1038/d41586-022-04383-z

    Article  PubMed  Google Scholar 

  14. Thorp HH (2023) ChatGPT is fun, but not an author. Science 379(6630):313

    Article  PubMed  Google Scholar 

  15. Alfertshofer M, Hoch CC, Funk PF, Hollmann K, Wollenberg B, Knoedler S et al (2023) Sailing the seven seas: a multinational comparison of ChatGPT’s performance on medical licensing examinations. Ann Biomed Eng. https://doi.org/10.1007/s10439-023-03338-3

    Article  PubMed  PubMed Central  Google Scholar 

  16. Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA et al (2023) Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. https://doi.org/10.1002/ohn.489

    Article  PubMed  Google Scholar 

  17. Taira K, Itaya T, Hanada A (2023) Performance of the large language model ChatGPT on the national nurse examinations in Japan: evaluation study. JMIR Nurs 6:e47305

    Article  PubMed  PubMed Central  Google Scholar 

  18. Qu RW, Qureshi U, Petersen G, Lee SC (2023) Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open 7(3):e67

    Article  PubMed  PubMed Central  Google Scholar 

  19. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA et al (2023) How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9:e45312

    Article  PubMed  PubMed Central  Google Scholar 

  20. Thirunavukarasu AJ, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M et al (2023) Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 9:e46599

    Article  PubMed  PubMed Central  Google Scholar 

  21. Tessler I, Wolfovitz A, Livneh N, Gecel NA, Sorin V, Barash Y et al (2024) Advancing medical practice with artificial intelligence: ChatGPT in healthcare. Isr Med Assoc J 26(2):80–85

    PubMed  Google Scholar 

  22. Hoch CC, Wollenberg B, Lüers J-C, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280(9):4271–4278

    Article  PubMed  PubMed Central  Google Scholar 

  23. Guirguis CA, Crossley JR, Malekzadeh S (2023) Bilateral vocal fold paralysis in a patient with neurosarcoidosis: a ChatGPT-driven case report describing an unusual presentation. Cureus 15(4):e37368

    PubMed  PubMed Central  Google Scholar 

  24. Kim H-Y (2023) A case report on ground-level alternobaric vertigo due to eustachian tube dysfunction with the assistance of conversational generative pre-trained transformer (ChatGPT). Cureus 15(3):e36830

    PubMed  PubMed Central  Google Scholar 

  25. Radulesco T, Saibene AM, Michel J, Vaira LA, Lechien JR (2024) ChatGPT-4 performance in rhinology: a clinical case series. Int Forum Allergy Rhinol. https://doi.org/10.1002/alr.23323

    Article  PubMed  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Idit Tessler.

Ethics declarations

Conflict of interest

All authors declare no conflict of interest in connection with this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (DOCX 47 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tessler, I., Wolfovitz, A., Alon, E.E. et al. ChatGPT’s adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol 281, 3829–3834 (2024). https://doi.org/10.1007/s00405-024-08634-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00405-024-08634-9

Keywords

Navigation