Why does ML in Medical Imaging work well only in theory?

Why does ML in Medical Imaging work well only in theory?

I bet you have seen a lot of super cool articles about the promising future of AI in the Medical Imagine. Maybe you have also seen announcements from the big giants and startups about the new revolutionary approach that will be soon available in your GPs room. But today as many many years ago doctors still use their eyes to detect abnormalities in the images and still make human-prone errors. In the graph below you can find that about officially 12% of the adults in Germany has confirmed being a victim of medical errors, in reality, this number is much bigger because not all error cases are being discovered and documented.

No alt text provided for this image


Understanding the critical point that needs to be solved, a lot of research and commercial groups are doing their best to automate image diagnosis, and despite the fact that some of them make good progress in the research papers, in the real clinical setting, the adaptation and effectiveness of these solutions are still far from the needed one. To be honest, there is no available clinical solution for medical images right now that can work well enough without human-in-the-loop.

This article is aimed not to destroy the reputation of AI in Medical Images, but rather to underline the needed changes in the AI experimentation approach to reach the actual value for the patient's health. We will skip the most easy-to-understand problems like lack of financing, difficult access to the data, separation of technicians from medical staff, and so on. Instead, I want to focus on not such obvious but very important issues that push back any project in this direction.

ALL DATA is biased.  Datasets that are collected as part of a population study usually have only patients with a higher incidence of disease and do not reflect the average popularity. The sensitivity and specificity of the model calibrated on this data will be appropriate for clinical usage where an average person is taken into account. Dream-time: To overcome this problem the training and testing data should be the same, conditions that are almost unavailable to follow if the development group just uses the data available in the clinic without direct support from clinical managers to find needed matching cases.

Not only humans can be biased but image devices also. Any kind of imperfections, additional labels, or quality decrease in the equipment can significantly affect the effectiveness of the ML model, especially if the high-quality data was used for training and testing. Dream-time: Ideally solution is to pre-train models for each care point separately, use the most diverse dataset possible, and re-training the model each time the new equipment arrived. Accurate predictions, but very expensive way.

Overoptimistic evaluation of the models. Investors and the market are not willing to wait a lot and get "well-enough results", so the teams need to adjust the evaluation methods to lead the game and get part of the attention in the articles and among potential users. Problems like not very correct cross-validations, not being implacable in real-life performance metrics, and using accuracy score as a Nord star of the project - all together creates a mess in the understanding of the real product value. Dream-time: Collaboration between teams and honest estimation of the model performance on the correct prepared validation datasets.

Start easy, going more complex. This is definitely not about AI in Medical Images as the majority of teams try to beat the most difficult to detect problems from day one. Here they face two problems:

  • Lack of data. Even if the complicated cases are present, they are hard to be labeled and the number of cases will be small. Augmentation can work to some extent but not in all cases. And it will make your model biased even more (see point number 1 about biased data).
  • Hard to evaluate and validate model performance. New loop of sweat and blood to get a new data batch for validation with the same patient cohort.
  • Difficulty to test the solution. High entrance level to deliver it into the clinical reality because usually in these cases doctor error can cost the life of the patient.

Dream-time: Automate an easy routine and give doctors to apply their time to save human lives.

As a dreamer and AI nerd I like medicine and human health care with all my heart. And I believe that all these four problems can be solved within the next couple of years by building strong communication between the medical community and AI Engineers. That's what I love to do and what I encourage everyone to think about cause we need to support our best-ever medical staff with all available initiatives!

Yours Technical_Girl ✌🏻

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics