Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 30;21(1):305.
doi: 10.1186/s13075-019-2092-7.

Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record

Affiliations

Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record

Lia Jamian et al. Arthritis Res Ther. .

Abstract

Background: Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR.

Methods: We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud's phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms.

Results: PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword.

Conclusions: Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes.

Keywords: Algorithms; Bioinformatics; Electronic health records; Systemic sclerosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Development of algorithms to identify patients with systemic sclerosis (SSc) in the electronic health record (EHR). At least a 1-time count of the SSc ICD-9 code (710.1) or ICD-10-CM codes (M34*) was applied to the 3 million subjects in Vanderbilt’s Synthetic Derivative, which resulted in 1899 potential SSc cases. Of these 1899 potential SSc cases, 200 were randomly selected for a training set to develop and test algorithms with various combinations of the SSc ICD-9 and ICD-10-CM codes, keyword search for Raynaud’s phenomenon, and positive ANA (≥ 1:80). The highest performing algorithm was internally validated in a set of 100 subjects who were not part of the original training set

Similar articles

Cited by

References

    1. Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008;84:362–369. doi: 10.1038/clpt.2008.89. - DOI - PMC - PubMed
    1. Redd D, Frech TM, Murtaugh MA, Rhiannon J, Zeng QT. Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis. Comput Biol Med. 2014;53:203–205. doi: 10.1016/j.compbiomed.2014.07.022. - DOI - PMC - PubMed
    1. Valenzuela A, Yaqub A, Fiorentino D, Krishnan E, Chung L. Validation of the ICD-9-CM code for systemic sclerosis using updated ACR/EULAR classification criteria. Scand J Rheumatol. 2015;44:253–255. doi: 10.3109/03009742.2015.1008038. - DOI - PubMed
    1. Barnado A, Casey C, Carroll RJ, Wheless L, Denny JC, Crofford LJ. Developing electronic health record algorithms that accurately identify patients with systemic lupus erythematosus. Arthritis Care Res (Hoboken) 2017;69:687–693. doi: 10.1002/acr.22989. - DOI - PMC - PubMed
    1. Moores KG, Sathe NA. A systematic review of validated methods for identifying systemic lupus erythematosus (SLE) using administrative or claims data. Vaccine. 2013;31(Suppl 10):K62–K73. doi: 10.1016/j.vaccine.2013.06.104. - DOI - PubMed

Publication types