Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 21:9:20552076231194929.
doi: 10.1177/20552076231194929. eCollection 2023 Jan-Dec.

How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective

Affiliations

How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective

Marvin Kopka et al. Digit Health. .

Abstract

Objective: To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies.

Methods: We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers.

Results: In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality.

Conclusions: A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.

Keywords: Digital health; care navigation; case vignettes; methodology; patient-centered care; self-triage; symptom checker; test theory; urgency assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Methodological procedure of the present study.
Figure 2.
Figure 2.
Explanation of the CCS formula components.
Figure 3.
Figure 3.
Distribution of item difficulty of the case vignettes used in both studies.
Figure 4.
Figure 4.
Density of item-total correlation by study.
Figure 5.
Figure 5.
Density of item-total correlation by the study for each triage level. Note. Hill et al. used four triage levels (including Non-urgent care), while Semigran et al. used three only.
Figure 6.
Figure 6.
Comparison of item difficulty and item-total-correlation of vignettes used in both studies. Note. The dashed blue line indicates a linear model.
Figure 7.
Figure 7.
Procedure for reporting the symptom checker performance in future studies.
None
Figure 1 of the Appendix. Note. Not all symptom checkers appraised the same set of vignettes. This figure shows that the accuracy of symptom checkers depends on the vignettes that were entered and is confounded by the vignettes’ item difficulty.

Similar articles

Cited by

References

    1. Semigran HL, Linder JA, Gidengil C, et al. Evaluation of symptom checkers for self diagnosis and triage: audit study. Br Med J 2015; 351: h3480. - PMC - PubMed
    1. Ceney A, Tolond S, Glowinski A, et al. Accuracy of online symptom checkers and the potential impact on service utilisation. PloS One 2021; 16: e0254088. - PMC - PubMed
    1. Kopka M, Scatturin L, Napierala H, et al. Characteristics of users and nonusers of symptom checkers in Germany: cross-sectional survey study. J Med Internet Res 2023; 25: e46231. - PMC - PubMed
    1. Mueller J, Jay C, Harper S, et al. Web use for symptom appraisal of physical health conditions: a systematic review. J Med Internet Res 2017; 19: e202. - PMC - PubMed
    1. EPatient Analytics GmbH. EPatient Survey 2020, https://www.hcm-magazin.de/epatient-survey-2020-digital-health-studie/15... (2020, accessed 6 March 2021).

LinkOut - more resources