Opinion

Large language models are not fit for clinical practice … and other research

BMJ 2024; 386 doi: https://doi.org/10.1136/bmj.q1507 (Published 11 July 2024) Cite this as: BMJ 2024;386:q1507

Tom Nolan, clinical editor; sessional GP, Surrey

The BMJ, London

Tom Nolan reviews this week’s research

LLMs not fit to practise

Reviewing 360° assessment feedback from colleagues can be nerve wracking, particularly if you had to ask someone who you know doesn’t think much of your clinical skills (or just doesn’t like you) in order to get the required number of feedback forms. If large language model (LLM) AI diagnosticians had to be revalidated they’d probably be fine, as they’d have no shortage of clinicians from frothy start-ups hoping to become the next health tech unicorn willing to give it positive feedback. However, if they asked the authors of a new diagnostic accuracy study in Nature Medicine, they might find themselves in trouble. These authors conclude that “current state-of-the-art LLMs do not accurately diagnose patients across all pathologies (performing significantly worse than physicians), follow neither diagnostic nor treatment guidelines, and cannot interpret laboratory results, thus posing a serious risk to the health of patients.” Not only are LLMs poor at making a diagnosis, “they cannot be easily integrated into existing workflows because they often fail to follow instructions.” Ouch!

Nat Med doi:10.1038/s41591-024-03097-1

Benefits of intensive blood pressure control

How low should you go if you have hypertension and a high cardiovascular risk? We’ve heard from three major trials already (ACCORD, RESPECT, and SPRINT), and now there’s a fourth: ESPRIT. This open-label trial recruited over 11 000 people in China who had hypertension and high cardiovascular risk. They were randomised to intensive blood pressure control (targeting a systolic blood pressure of <120 mm Hg) or a standard systolic blood pressure target of 140 mm Hg. The researchers found that, after a median follow up of 3.4 years, intensive blood pressure targets led to slightly lower rates of the primary endpoint of myocardial infarction, revascularisation, hospital admission for heart failure, stroke, or death from cardiovascular causes: 9.7% versus 11.1% (hazard ratio 0.88, 95% confidence interval 0.78 to 0.99). There was no difference in serious adverse events between the groups, apart from a small increase in syncope in the intensive blood pressure control group (0.4% v 0.1%).

The generalisability of the major trials of intensive blood pressure control have been discussed at length. ESPRIT excluded anyone with a 1 minute standing systolic blood pressure of <110 mm Hg and used supervised, office based blood pressure readings (where the average of three blood pressure readings, each a minute apart, and after a quiet rest for at least 5 minutes, were used) in contrast to the unsupervised blood pressure readings done in the SPRINT study and increasingly in clinical practice.

Lancet doi:10.1016/S0140-6736(24)01028-6

Kidney function in extreme heat

I’ve often wondered about my kidneys when the weather gets too hot—and it’s common to see patients unwell with acute kidney injury during heat waves. Researchers tested the effect of extreme heat on kidney function by testing blood samples from healthy volunteers who went into a chamber heated to 47°C and 15% humidity (sauna) or 40% humidity (hot yoga studio) and did bouts of light activity over a three hour period. In the hot dry setting the volunteers had increases in blood markers of kidney function (creatinine and cystatin C), which were more marked in the older (>65 years old) volunteers. In the hot humid conditions, however, kidney function markers didn’t significantly change. I’m not sure what this means for managing the effects of extreme heat, but—reassured by the apparent protection from humidity—I might finally hatha go at Bikram yoga.

JAMA doi:10.1001/jama.2024.9845

Death rates and ECG screening

In Japan all employees over the age of 35 years are offered annual health screening, including an electrocardiogram (ECG). A cohort study examined the link between abnormalities in these screening ECGs and all-cause death and hospital admission due to cardiovascular disease. This composite outcome occurred more often over a median five year follow-up period in people with major abnormalities on their ECG than those with no abnormalities (adjusted hazard ratio 1.96, 95% CI 1.92 to 2.02). Although that might seem like a good argument for ECG screening, the study doesn’t tell us whether the higher risks are due to cardiac disease indicated by the abnormal ECGs or as a result of the interventions offered to investigate them, or how many of the abnormalities identified came with an effective intervention. However, with around 20% of people having at least one (minor or major) abnormality on their ECG, introducing ECG population screening would seem a quick win for anyone looking to add more low value work to primary care.

JAMA Intern Med doi:10.1001/jamainternmed.2024.2270

Weight gain with antidepressants

Weight gain is a common concern for people considering taking antidepressants. An observational cohort study in the US looked at the relative weight change of 183 118 people prescribed antidepressants between 2010 and 2019. Compared with those prescribed sertraline, greater weight gain (an average of up to 0.5 kg over 2 years) was found in those prescribed escitalopram, paroxetine, duloxetine, venlafaxine, or citalopram. Weight gain with fluoxetine was similar to that with sertraline, and the only antidepressant that seems to offer slightly lower levels of weight gain compared with sertraline was the one that isn’t licensed for treatment of depression in the UK: bupropion.

Ann Intern Med doi:10.7326/M23-2742

Footnotes

Competing interests: None declared
Provenance and peer review: Not commissioned; not peer reviewed

See other articles in issue 8435

Article tools

PDF 0 responses

Respond to this article
Print
Alerts & updates
Article alerts
Please note: your email address is provided to the journal, which may use this information for marketing purposes.

Log in or register:

Register for alerts

If you have registered for alerts, you should use your registered email address as your username
Citation tools
Download this article to citation manager

Tom Nolan clinical editor; sessional GP, Surrey

Nolan T. Large language models are not fit for clinical practice … and other research BMJ 2024; 386 :q1507 doi:10.1136/bmj.q1507

BibTeX (win & mac)Download

EndNote (tagged)Download

EndNote 8 (xml)Download

RefWorks Tagged (win & mac)Download

RIS (win only)Download

MedlarsDownload

Help

If you are unable to import citations, please contact technical support for your product directly (links go to external sites):

EndNote

ProCite

Reference Manager

RefWorks

Zotero
Request permissions

Large language models are not fit for clinical practice … and other research

LLMs not fit to practise

Benefits of intensive blood pressure control

Kidney function in extreme heat

Death rates and ECG screening

Weight gain with antidepressants

Footnotes

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information