Intended for healthcare professionals

Opinion

Large language models are not fit for clinical practice … and other research

BMJ 2024; 386 doi: https://doi.org/10.1136/bmj.q1507 (Published 11 July 2024) Cite this as: BMJ 2024;386:q1507
  1. Tom Nolan, clinical editor; sessional GP, Surrey
  1. The BMJ, London

Tom Nolan reviews this week’s research

LLMs not fit to practise

Reviewing 360° assessment feedback from colleagues can be nerve wracking, particularly if you had to ask someone who you know doesn’t think much of your clinical skills (or just doesn’t like you) in order to get the required number of feedback forms. If large language model (LLM) AI diagnosticians had to be revalidated they’d probably be fine, as they’d have no shortage of clinicians from frothy start-ups hoping to become the next health tech unicorn willing to give it positive feedback. However, if they asked the authors of a new diagnostic accuracy study in Nature Medicine, they might find themselves in trouble. These authors conclude that “current state-of-the-art LLMs do not accurately diagnose patients across all pathologies (performing significantly worse than physicians), follow neither diagnostic nor treatment guidelines, and cannot interpret laboratory results, thus posing a serious risk to the health of patients.” Not only are LLMs poor at making a diagnosis, “they cannot be easily integrated into existing workflows because they often fail to follow instructions.” Ouch!

Nat Med doi:10.1038/s41591-024-03097-1

Benefits of intensive blood pressure control

How low should you go if you have hypertension and a high cardiovascular risk? We’ve heard from three major trials already (ACCORD, RESPECT, and SPRINT), and now there’s a fourth: ESPRIT. This open-label trial recruited over 11 000 people in China who had hypertension and high cardiovascular risk. They were randomised to intensive blood pressure control (targeting a systolic blood pressure of <120 mm Hg) or a standard systolic blood pressure target of 140 mm Hg. The researchers found that, after a median follow up of 3.4 years, intensive blood pressure targets led to slightly lower rates of the primary endpoint of myocardial infarction, revascularisation, hospital admission for heart failure, stroke, or death from cardiovascular causes: 9.7% versus 11.1% (hazard ratio 0.88, 95% confidence interval 0.78 to 0.99). There was no difference in serious adverse events between the groups, apart from a small increase in syncope in the intensive blood pressure control group (0.4% v 0.1%).

The generalisability of the major trials of intensive blood pressure control have been discussed at length. ESPRIT excluded anyone with a 1 minute standing systolic blood pressure of <110 mm Hg and used supervised, office based blood pressure readings (where the average of three blood pressure readings, each a minute apart, and after a quiet rest for at least 5 minutes, were used) in contrast to the unsupervised blood pressure readings done in the SPRINT study and increasingly in clinical practice.

Lancet doi:10.1016/S0140-6736(24)01028-6

Kidney function in extreme heat

I’ve often wondered about my kidneys when the weather gets too hot—and it’s common to see patients unwell with acute kidney injury during heat waves. Researchers tested the effect of extreme heat on kidney function by testing blood samples from healthy volunteers who went into a chamber heated to 47°C and 15% humidity (sauna) or 40% humidity (hot yoga studio) and did bouts of light activity over a three hour period. In the hot dry setting the volunteers had increases in blood markers of kidney function (creatinine and cystatin C), which were more marked in the older (>65 years old) volunteers. In the hot humid conditions, however, kidney function markers didn’t significantly change. I’m not sure what this means for managing the effects of extreme heat, but—reassured by the apparent protection from humidity—I might finally hatha go at Bikram yoga.

JAMA doi:10.1001/jama.2024.9845

Death rates and ECG screening

In Japan all employees over the age of 35 years are offered annual health screening, including an electrocardiogram (ECG). A cohort study examined the link between abnormalities in these screening ECGs and all-cause death and hospital admission due to cardiovascular disease. This composite outcome occurred more often over a median five year follow-up period in people with major abnormalities on their ECG than those with no abnormalities (adjusted hazard ratio 1.96, 95% CI 1.92 to 2.02). Although that might seem like a good argument for ECG screening, the study doesn’t tell us whether the higher risks are due to cardiac disease indicated by the abnormal ECGs or as a result of the interventions offered to investigate them, or how many of the abnormalities identified came with an effective intervention. However, with around 20% of people having at least one (minor or major) abnormality on their ECG, introducing ECG population screening would seem a quick win for anyone looking to add more low value work to primary care.

JAMA Intern Med doi:10.1001/jamainternmed.2024.2270

Weight gain with antidepressants

Weight gain is a common concern for people considering taking antidepressants. An observational cohort study in the US looked at the relative weight change of 183 118 people prescribed antidepressants between 2010 and 2019. Compared with those prescribed sertraline, greater weight gain (an average of up to 0.5 kg over 2 years) was found in those prescribed escitalopram, paroxetine, duloxetine, venlafaxine, or citalopram. Weight gain with fluoxetine was similar to that with sertraline, and the only antidepressant that seems to offer slightly lower levels of weight gain compared with sertraline was the one that isn’t licensed for treatment of depression in the UK: bupropion.

Ann Intern Med doi:10.7326/M23-2742

Footnotes

  • Competing interests: None declared

  • Provenance and peer review: Not commissioned; not peer reviewed