Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Page 1

Pranav Rajpurkar∗

PRANAVSR@CS.STANFORD.EDU

Awni Y. Hannun∗

AWNI@CS.STANFORD.EDU

Masoumeh Haghpanahi

MHAGHPANAHI@IRHYTHMTECH.COM

Codie Bourn

CBOURN@IRHYTHMTECH.COM

Andrew Y. Ng

ANG@CS.STANFORD.EDU

Abstract

We develop an algorithm which exceeds the per-

formance of board certified cardiologists in de-

tecting a wide range of heart arrhythmias from

electrocardiograms recorded with a single-lead

wearable monitor. We build a dataset with more

than 500 times the number of unique patients

than previously studied corpora. On this dataset,

we train a 34-layer convolutional neural network

which maps a sequence of ECG samples to a se-

quence of rhythm classes. Committees of board-

certified cardiologists annotate a gold standard

test set on which we compare the performance of

our model to that of 6 other individual cardiolo-

gists. We exceed the average cardiologist perfor-

mance in both recall (sensitivity) and precision

(positive predictive value).

1. Introduction

We develop a model which can diagnose irregular heart

rhythms, also known as arrhythmias, from single-lead ECG

signals better than a cardiologist. Key to exceeding ex-

pert performance is a deep convolutional network which

can map a sequence of ECG samples to a sequence of ar-

rhythmia annotations along with a novel dataset two orders

of magnitude larger than previous datasets of its kind.

Many heart diseases, including Myocardial Infarction, AV

Block, Ventricular Tachycardia and Atrial Fibrillation can

all be diagnosed from ECG signals with an estimated 300

million ECGs recorded annually (Hed�n et al., 1996). We

investigate the task of arrhythmia detection from the ECG

record. This is known to be a challenging task for com-

puters but can usually be determined by an expert from a

single, well-placed lead.

*Authors contributed equally.

Project website at https://stanfordmlgroup.

github.io/projects/ecg

Figure 1. Our trained convolutional neural network correctly de-

tecting the sinus rhythm (SINUS) and Atrial Fibrillation (AFIB)

from this ECG recorded with a single-lead wearable heart moni-

tor.

Arrhythmia detection from ECG recordings is usually per-

formed by expert technicians and cardiologists given the

high error rates of computerized interpretation. One study

found that of all the computer predictions for non-sinus

rhythms, only about 50% were correct (Shah & Rubin,

2007); in another study, only 1 out of every 7 presentations

of second degree AV block were correctly recognized by

the algorithm (Guglin & Thatai, 2006). To automatically

detect heart arrhythmias in an ECG, an algorithm must im-

plicitly recognize the distinct wave types and discern the

complex relationships between them over time. This is dif-

ficult due to the variability in wave morphology between

patients as well as the presence of noise.

We train a 34-layer convolutional neural network (CNN)

to detect arrhythmias in arbitrary length ECG time-series.

Figure 1 shows an example of an input to the model. In

addition to classifying noise and the sinus rhythm, the

network learns to classify and segment twelve arrhythmia

types present in the time-series. The model is trained end-

arXiv:1707.01836v1 [cs.CV] 6 Jul 2017

Page 2

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

to-end on a single-lead ECG signal sampled at 200Hz and

a sequence of annotations for every second of the ECG

as supervision. To make the optimization of such a deep

model tractable, we use residual connections and batch-

normalization (He et al., 2016b; Ioffe & Szegedy, 2015).

The depth increases both the non-linearity of the compu-

tation as well as the size of the context window for each

classification decision.

We construct a dataset 500 times larger than other datasets

of its kind (Moody & Mark, 2001; Goldberger et al., 2000).

One of the most popular previous datasets, the MIT-BIH

corpus contains ECG recordings from 47 unique patients.

In contrast, we collect and annotate a dataset of about

30,000 unique patients from a pool of nearly 300,000 pa-

tients who have used the Zio Patch monitor1 (Turakhia

et al., 2013). We intentionally select patients exhibiting ab-

normal rhythms in order to make the class balance of the

dataset more even and thus the likelihood of observing un-

usual heart-activity high.

We test our model against board-certified cardiologists. A

committee of three cardiologists serve as gold-standard an-

notators for the 336 examples in the test set. Our model

exceeds the individual expert performance on both recall

(sensitivity), and precision (positive predictive value) on

this test set.

2. Model

Problem Formulation

The ECG arrhythmia detection task is a sequence-to-

sequence task which takes as input an ECG signal X =

[x1, ..xk], and outputs a sequence of labels r = [r1, ...rn],

such that each ri can take on one of m different rhythm

classes. Each output label corresponds to a segment of the

input. Together the output labels cover the full sequence.

For a single example in the training set, we optimize the

cross-entropy objective function

L(X, r) =

∑

i=1

log p(R = ri | X)

where p(�) is the probability the network assigns to the i-th

output taking on the value ri.

Model Architecture and Training

We use a convolutional neural network for the sequence-to-

sequence learning task. The high-level architecture of the

network is shown in Figure 2. The network takes as input

a time-series of raw ECG signal, and outputs a sequence

of label predictions. The 30 second long ECG signal is

1iRhythm Technologies, San Francisco, California

max pool

conv

ReLU

Dropout

conv

max pool

ReLU

Dropout

conv

ReLU

Dropout

conv

ReLU

Input

� 15

ReLU

Softmax

dense

Figure 2. The architecture of the network. The first and last layer

are special-cased due to the pre-activation residual blocks. Over-

all, the network contains 33 layers of convolution followed by a

fully-connected layer and a softmax.

sampled at 200Hz, and the model outputs a new prediction

once every second. We arrive at an architecture which is 33

layers of convolution followed by a fully connected layer

and a softmax.

In order to make the optimization of such a network

tractable, we employ shortcut connections in a similar man-

ner to those found in the Residual Network architecture (He

et al., 2015b). The shortcut connections between neural-

network layers optimize training by allowing information

to propagate well in very deep neural networks. Before

the input is fed into the network, it is normalized using a

Page 3

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Figure 3. Evaluated on the test set, the model outperforms the

average cardiologist score on both the Sequence and the Set F1

metrics.

robust normalization strategy. The network consists of 16

residual blocks with 2 convolutional layers per block. The

convolutional layers all have a filter length of 16 and have

64k filters, where k starts out as 1 and is incremented every

4-th residual block. Every alternate residual block subsam-

ples its inputs by a factor of 2, thus the original input is

ultimately subsampled by a factor of 28. When a resid-

ual block subsamples the input, the corresponding shortcut

connections also subsample their input using a Max Pool-

ing operation with the same subsample factor.

Before each convolutional layer we apply Batch Normal-

ization (Ioffe & Szegedy, 2015) and a rectified linear acti-

vation, adopting the pre-activation block design (He et al.,

2016a). The first and last layers of the network are special-

cased due to this pre-activation block structure. We also

apply Dropout (Srivastava et al., 2014) between the convo-

lutional layers and after the non-linearity. The final fully

connected layer and softmax activation produce a distribu-

tion over the 14 output classes for each time-step.

We train the networks from scratch, initializing the weights

of the convolutional layers as in (He et al., 2015a). We use

the Adam (Kingma & Ba, 2014) optimizer with the default

parameters and reduce the learning rate by a factor of 10

when the validation loss stops improving. We save the best

model as evaluated on the validation set during the opti-

mization process. [ht]

3. Data

Training

We collect and annotate a dataset of 64,121 ECG records

from 29,163 patients. The ECG data is sampled at a fre-

quency of 200 Hz and is collected from a single-lead, non-

invasive and continuous monitoring device called the Zio

Patch which has a wear period up to 14 days (Turakhia

et al., 2013). Each ECG record in the training set is 30

seconds long and can contain more than one rhythm type.

Each record is annotated by a clinical ECG expert: the ex-

pert highlights segments of the signal and marks it as cor-

responding to one of the 14 rhythm classes.

The 30 second records were annotated using a web-based

ECG annotation tool designed for this work. Label anno-

tations were done by a group of Certified Cardiographic

Technicians who have completed extensive training in ar-

rhythmia detection and a cardiographic certification exam-

ination by Cardiovascular Credentialing International. The

technicians were guided through the interface before they

could annotate records. All rhythms present in a strip were

labeled from their corresponding onset to offset, resulting

in full segmentation of the input ECG data. To improve

labeling consistency among different annotators, specific

rules were devised regarding each rhythm transition.

We split the dataset into a training and validation set. The

training set contains 90% of the data. We split the dataset

so that there is no patient overlap between the training and

validation sets (as well as the test set described below).

Testing

We collect a test set of 336 records from 328 unique

patients. For the test set, ground truth annotations for

each record were obtained by a committee of three board-

certified cardiologists; there are three committees respon-

sible for different splits of the test set. The cardiologists

discussed each individual record as a group and came to a

consensus labeling. For each record in the test set we also

collect 6 individual annotations from cardiologists not par-

ticipating in the group. This is used to assess performance

of the model compared to an individual cardiologist.

Rhythm Classes

We identify 12 heart arrhythmias, sinus rhythm and noise

for a total of 14 output classes. The arrhythmias are char-

acterized by a variety of features. Table 2 in the Appendix

shows an example of each rhythm type we classify. The

noise label is assigned when the device is disconnected

from the skin or when the baseline noise in the ECG makes

identification of the underlying rhythm impossible.

The morphology of the ECG during a single heart-beat as

well as the pattern of the activity of the heart over time de-

termine the underlying rhythm. In some cases the distinc-

tion between the rhythms can be subtle yet critical for treat-

ment. For example two forms of second degree AV Block,

Mobitz I (Wenckebach) and Mobitz II (here referred to as

AVB TYPE2) can be difficult to distinguish. Wenckebach

is considered benign and Mobitz II is considered patholog-

Page 4

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Seq

Set

Model Cardiol. Model Cardiol.

Class-level F1 Score

AFIB

0.604

0.515

0.667

0.544

AFL

0.687

0.635

0.679

0.646

AVB TYPE2

0.689

0.535

0.656

0.529

BIGEMINY

0.897

0.837

0.870

0.849

CHB

0.843

0.701

0.852

0.685

EAR

0.519

0.476

0.571

0.529

IVR

0.761

0.632

0.774

0.720

JUNCTIONAL

0.670

0.684

0.783

0.674

NOISE

0.823

0.768

0.704

0.689

SINUS

0.879

0.847

0.939

0.907

SVT

0.477

0.449

0.658

0.556

TRIGEMINY

0.908

0.843

0.870

0.816

0.506

0.566

0.694

0.769

WENCKEBACH

0.709

0.593

0.806

0.736

Aggregate Results

Precision (PPV)

0.800

0.723

0.809

0.763

Recall (Sensitivity)

0.784

0.724

0.827

0.744

0.776

0.719

0.809

0.751

Table 1. The top part of the table gives a class-level comparison of

the expert to the model F1 score for both the Sequence and the Set

metrics. The bottom part of the table shows aggregate results over

the full test set for precision, recall and F1 for both the Sequence

and Set metrics.

ical, requiring immediate attention (Dubin, 1996).

Table 2 in the Appendix also shows the number of unique

patients in the training (including validation) set and test

set for each rhythm type.

4. Results

Evaluation Metrics

We use two metrics to measure model accuracy, using the

cardiologist committee annotations as the ground truth.

Sequence Level Accuracy (F1): We measure the aver-

age overlap between the prediction and the ground truth

sequence labels. For every record, a model is required to

make a prediction approximately once per second (every

256 samples). These predictions are compared against the

ground truth annotation.

Set Level Accuracy (F1): Instead of treating the labels for

a record as a sequence, we consider the set of unique ar-

rhythmias present in each 30 second record as the ground

truth annotation. Set Level Accuracy, unlike Sequence

Level Accuracy, does not penalize for time-misalignment

within a record. We report the F1 score between the unique

class labels from the ground truth and those from the model

prediction.

In both the Sequence and the Set case, we compute the

F1 score for each class separately. We then compute the

overall F1 (and precision and recall) as the class-frequency

weighted mean.

Model vs. Cardiologist Performance

We assess the cardiologist performance on the test set. Re-

call that each of the records in the test set has a ground

truth label from a committee of three cardiologists as well

as individual labels from a disjoint set of 6 other cardiolo-

gists. To assess cardiologist performance for each class, we

take the average of all the individual cardiologist F1 scores

using the group label as the ground truth annotation.

Table 1 shows the breakdown of both cardiologist and

model scores across the different rhythm classes. The

model outperforms the average cardiologist performance

on most rhythms, noticeably outperforming the cardiolo-

gists in the AV Block set of arrhythmias which includes

Mobitz I (Wenckebach), Mobitz II (AVB Type2) and com-

plete heart block (CHB). This is especially useful given

the severity of Mobitz II and complete heart block and the

importance of distinguishing these two from Wenckebach

which is usually considered benign.

Table 1 also compares the aggregate precision, recall and

F1 for both model and cardiologist compared to the ground

truth annotations. The aggregate scores for the cardiolo-

gist are computed by taking the mean of the individual car-

diologist scores. The model outperforms the cardiologist

average in both precision and recall.

5. Analysis

The model outperforms the average cardiologist score on

both the sequence and the set F1 metrics. Figure 4 shows

a confusion matrix of the model predictions on the test set.

Many arrhythmias are confused with the sinus rhythm. We

expect that part of this is due to the sometimes ambiguous

location of the exact onset and offset of the arrhythmia in

the ECG record.

Often the mistakes made by the model are understand-

able. For example, confusing Wenckebach and AVB Type2

makes sense given that the two rhythms in general have

very similar ECG morphologies. Similarly, Supraventric-

ular Tachycardia (SVT) and Atrial Fibrillation (AFIB) are

often confused with Atrial Flutter (AFL) which is under-

standable given that they are all atrial arrhythmias. We also

note that Idioventricular Rhythm (IVR) is sometimes mis-

taken as Ventricular Tachycardia (VT), which again makes

sense given that the two only differ in heart-rate and are

difficult to distinguish close to the 100 beats per minute de-

lineation.

Page 5

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

AFIB

AFL

AVB_TYPE2

BIGEMINY

CHB EAR

IVR

JUNCTIONAL

NOISE SINUS

SVT

TRIGEMINY

WENCKEBACH

Predicted label

AFIB

AFL

AVB_TYPE2

BIGEMINY

CHB

EAR

IVR

JUNCTIONAL

NOISE

SINUS

SVT

TRIGEMINY

WENCKEBACH

True label

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 4. A confusion matrix for the model predictions on the test

set. Many of the mistakes the model makes are not surprising.

For example, confusing second degree AV Block (Type 2) with

Wenckebach makes sense given the often similar expression of

the two arrhythmias in the ECG record.

One of the most common confusions is between Ectopic

Atrial Rhythm (EAR) and sinus rhythm. The main distin-

guishing criteria for this rhythm is an irregular P wave. This

can be subtle to detect especially when the P wave has a

small amplitude or when noise is present in the signal.

6. Related Work

Automatic high-accuracy methods for R-peak extraction

have existed at least since the mid 1980’s (Pan & Tomp-

kins, 1985). Current algorithms for R-peak extraction tend

to use wavelet transformations to compute features from

the raw ECG followed by finely-tuned threshold based clas-

sifiers (Li et al., 1995; Martınez et al., 2004). Because ac-

curate estimates of heart rate and heart rate variability can

be extracted from R-peak features, feature-engineered al-

gorithms are often used for coarse-grained heart rhythm

classification, including detecting tachycardias (fast heart

rate), bradycardias (slow heart rate), and irregular rhythms.

However, such features alone are not sufficient to distin-

guish between most heart arrhythmias since features based

on the atrial activity of the heart as well as other features

pertaining to the QRS morphology are needed.

Much work has been done to automate the extraction of

other features from the ECG. For example, beat classifica-

tion is a common sub-problem of heart-arrhythmia classifi-

cation. Drawing inspiration from automatic speech recog-

nition, Hidden Markov models with Gaussian observation

probability distributions have been applied to the task of

beat detection (Coast et al., 1990). Artificial neural net-

works have also been used for the task of beat detection

(Melo et al., 2000). While these models have achieved

high-accuracy for some beat types, they are not yet suffi-

cient for high-accuracy heart arrhythmia classification and

segmentation. For example, (Artis et al., 1991) train a

neural network to distinguish between Atrial Fibrillation

and Sinus Rhythm on the MIT-BIH dataset. While the

network can distinguish between these two classes with

high-accuracy, it does not generalize to noisier single-lead

recordings or classify among the full range of 15 rhythms

available in MIT-BIH. This is in part due to insufficient

training data, and because the model also discards critical

information in the feature extraction stage.

The most common dataset used to design and evaluate ECG

algorithms is the MIT-BIH arrhythmia database (Moody

& Mark, 2001) which consists of 48 half-hour strips of

ECG data. Other commonly used datasets include the

MIT-BIH Atrial Fibrillation dataset (Moody & Mark, 1983)

and the QT dataset (Laguna et al., 1997). While useful

benchmarks for R-peak extraction and beat-level annota-

tions, these datasets are too small for fine-grained arrhyth-

mia classification. The number of unique patients is in the

single digit hundreds or fewer for these benchmarks. A

recently released dataset captured from the AliveCor ECG

monitor contains about 7000 records (Clifford et al., 2017).

These records only have annotations for Atrial Fibrillation;

all other arrhythmias are grouped into a single bucket. The

dataset we develop contains 29,163 unique patients and 14

classes with hundreds of unique examples for the rarest ar-

rhythmias.

Machine learning models based on deep neural networks

have consistently been able to approach and often exceed

human agreement rates when large annotated datasets are

available (Amodei et al., 2016; Xiong et al., 2016; He et al.,

2015c). These approaches have also proven to be effective

in healthcare applications, particularly in medical imaging

where pretrained ImageNet models can be applied (Esteva

et al., 2017; Gulshan et al., 2016). We draw on work in au-

tomatic speech recognition for processing time-series with

deep convolutional neural networks and recurrent neural

networks (Hannun et al., 2014; Sainath et al., 2013), and

techniques in deep learning to make the optimization of

these models tractable (He et al., 2016b;c; Ioffe & Szegedy,

2015).

7. Conclusion

We develop a model which exceeds the cardiologist perfor-

mance in detecting a wide range of heart arrhythmias from

single-lead ECG records. Key to the performance of the

model is a large annotated dataset and a very deep convolu-

tional network which can map a sequence of ECG samples

Page 6

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

to a sequence of arrhythmia annotations.

On the clinical side, future work should investigate extend-

ing the set of arrhythmias and other forms of heart disease

which can be automatically detected with high-accuracy

from single or multiple lead ECG records. For example we

do not detect Ventricular Flutter or Fibrillation. We also do

not detect Left or Right Ventricular Hypertrophy, Myocar-

dial Infarction or a number of other heart diseases which do

not necessarily exhibit as arrhythmias. Some of these may

be difficult or even impossible to detect on a single-lead

ECG but can often be seen on a multiple-lead ECG.

Given that more than 300 million ECGs are recorded an-

nually, high-accuracy diagnosis from ECG can save expert

clinicians and cardiologists considerable time and decrease

the number of misdiagnoses. Furthermore, we hope that

this technology coupled with low-cost ECG devices en-

ables more widespread use of the ECG as a diagnostic tool

in places where access to a cardiologist is difficult.

Acknowledgements

We thank Geoffrey H. Tison MD, MPH of UCSF for help-

ful feedback on the experiments and references.

References

Amodei, Dario, Anubhai, Rishita, Battenberg, Eric, Case,

Carl, Casper, Jared, Catanzaro, Bryan, Chen, JingDong,

Chrzanowski, Mike, Coates, Adam, Diamos, Greg, et al.

Deep speech 2: End-to-end speech recognition in english

and mandarin. In Proceedings of The 33rd International

Conference on Machine Learning, pp. 173–182, 2016.

Artis, Shane G, Mark, RG, and Moody, GB. Detection

of atrial fibrillation using artificial neural networks. In

Computers in Cardiology 1991, Proceedings., pp. 173–

176. IEEE, 1991.

Clifford, GD, Liu, CY, Moody, B, Lehman, L, Silva, I, Li,

Q, Johnson, AEW, and Mark, RG. Af classification from

a short single lead ecg recording: The physionet comput-

ing in cardiology challenge 2017. 2017.

Coast, Douglas A, Stern, Richard M, Cano, Gerald G, and

Briller, Stanley A. An approach to cardiac arrhythmia

analysis using hidden markov models. IEEE Transac-

tions on biomedical Engineering, 37(9):826–836, 1990.

Dubin, Dale. Rapid Interpretation of EKG’s. USA: Cover

Publishing Company, 1996, 1996.

Esteva, Andre, Kuprel, Brett, Novoa, Roberto A, Ko,

Justin, Swetter, Susan M, Blau, Helen M, and Thrun, Se-

bastian. Dermatologist-level classification of skin cancer

with deep neural networks. Nature, 542(7639):115–118,

2017.

Goldberger, Ary L, Amaral, Luis AN, Glass, Leon, Haus-

dorff, Jeffrey M, Ivanov, Plamen Ch, Mark, Roger G,

Mietus, Joseph E, Moody, George B, Peng, Chung-

Kang, and Stanley, H Eugene. Physiobank, phys-

iotoolkit, and physionet components of a new research

resource for complex physiologic signals. Circulation,

101(23):e215–e220, 2000.

Guglin, Maya E and Thatai, Deepak. Common errors

in computer electrocardiogram interpretation. Interna-

tional journal of cardiology, 106(2):232–237, 2006.

Gulshan, Varun, Peng, Lily, Coram, Marc, Stumpe, Mar-

tin C, Wu, Derek, Narayanaswamy, Arunachalam, Venu-

gopalan, Subhashini, Widner, Kasumi, Madams, Tom,

Cuadros, Jorge, et al. Development and validation

of a deep learning algorithm for detection of diabetic

retinopathy in retinal fundus photographs. JAMA, 316

(22):2402–2410, 2016.

Hannun, Awni Y., Case, Carl, Casper, Jared, Catanzaro,

Bryan, Diamos, Greg, Elsen, Erich, Prenger, Ryan,

Satheesh, Sanjeev, Sengupta, Shubho, Coates, Adam,

and Ng, Andrew Y. Deep speech: Scaling up end-to-

end speech recognition. abs/1412.5567, 2014. URL

http://arxiv.org/abs/1412.5567.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,

Jian. Delving deep into rectifiers: Surpassing human-

level performance on imagenet classification. CoRR,

abs/1502.01852, 2015a. URL http://arxiv.org/

abs/1502.01852.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and

Sun, Jian. Deep residual learning for image recogni-

tion. CoRR, abs/1512.03385, 2015b. URL http:

//arxiv.org/abs/1512.03385.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,

Jian. Delving deep into rectifiers: Surpassing human-

level performance on imagenet classification. In Pro-

ceedings of the IEEE international conference on com-

puter vision, pp. 1026–1034, 2015c.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and

Sun, Jian. Identity mappings in deep residual net-

works. CoRR, abs/1603.05027, 2016a. URL http:

//arxiv.org/abs/1603.05027.

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,

Jian. Deep residual learning for image recognition. In

Proceedings of the IEEE Conference on Computer Vi-

sion and Pattern Recognition, pp. 770–778, 2016b.

Page 7

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,

Jian. Identity mappings in deep residual networks. In

European Conference on Computer Vision, pp. 630–645.

Springer, 2016c.

Hed�n, Bo, Ohlsson, Mattias, Holst, Holger, Mj�man, Mat-

tias, Rittner, Ralf, Pahlm, Olle, Peterson, Carsten, and

Edenbrandt, Lars. Detection of frequently overlooked

electrocardiographic lead reversals using artificial neu-

ral networks. The American journal of cardiology, 78

(5):600–604, 1996.

Ioffe, Sergey and Szegedy, Christian. Batch normalization:

Accelerating deep network training by reducing internal

covariate shift. arXiv preprint arXiv:1502.03167, 2015.

Kingma, Diederik and Ba, Jimmy.

Adam: A

method for stochastic optimization.

arXiv preprint

arXiv:1412.6980, 2014.

Laguna, Pablo, Mark, Roger G, Goldberg, A, and Moody,

George B. A database for evaluation of algorithms for

measurement of qt and other waveform intervals in the

ecg. In Computers in Cardiology 1997, pp. 673–676.

IEEE, 1997.

Li, Cuiwei, Zheng, Chongxun, and Tai, Changfeng. De-

tection of ECG characteristic points using wavelet trans-

forms. IEEE Transactions on biomedical Engineering,

42(1):21–28, 1995.

Martınez, Juan Pablo, Almeida, Rute, Olmos, Salvador,

Rocha, Ana Paula, and Laguna, Pablo. A wavelet-

based ECG delineator: evaluation on standard databases.

IEEE Transactions on biomedical engineering, 51(4):

570–581, 2004.

Melo, SL, Caloba, LP, and Nadal, J. Arrhythmia analysis

using artificial neural network and decimated electrocar-

diographic data. In Computers in Cardiology 2000, pp.

73–76. IEEE, 2000.

Moody, George B and Mark, Roger G. A new method for

detecting atrial fibrillation using RR intervals. Comput-

ers in Cardiology, 10(1):227–230, 1983.

Moody, George B and Mark, Roger G. The impact of

the MIT-BIH arrhythmia database. IEEE Engineering

in Medicine and Biology Magazine, 20(3):45–50, 2001.

Pan, Jiapu and Tompkins, Willis J. A real-time QRS detec-

tion algorithm. IEEE transactions on biomedical engi-

neering, (3):230–236, 1985.

Sainath, Tara N, Mohamed, Abdel-rahman, Kingsbury,

Brian, and Ramabhadran, Bhuvana. Deep convolutional

neural networks for lvcsr. In Acoustics, speech and sig-

nal processing (ICASSP), 2013 IEEE international con-

ference on, pp. 8614–8618. IEEE, 2013.

Shah, Atman P and Rubin, Stanley A. Errors in the

computerized electrocardiogram interpretation of car-

diac rhythm. Journal of electrocardiology, 40(5):385–

390, 2007.

Srivastava, Nitish, Hinton, Geoffrey E, Krizhevsky, Alex,

Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout:

a simple way to prevent neural networks from overfit-

ting. Journal of Machine Learning Research, 15(1):

1929–1958, 2014.

Turakhia, Mintu P, Hoang, Donald D, Zimetbaum, Peter,

Miller, Jared D, Froelicher, Victor F, Kumar, Uday N,

Xu, Xiangyan, Yang, Felix, and Heidenreich, Paul A.

Diagnostic utility of a novel leadless arrhythmia moni-

toring device. The American journal of cardiology, 112

(4):520–524, 2013.

Xiong, Wayne, Droppo, Jasha, Huang, Xuedong, Seide,

Frank, Seltzer, Mike, Stolcke, Andreas, Yu, Dong,

and Zweig, Geoffrey. Achieving human parity in

conversational speech recognition.

arXiv preprint

arXiv:1610.05256, 2016.

Page 8

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Appendix

Train + Val

Test

Class

Description

Example

Patients

AFIB

Atrial Fibrilla-

tion

4638

AFL

Atrial Flutter

3805

AVB TYPE2

Second degree

AV Block Type

2 (Mobitz II)

1905

BIGEMINY

Ventricular

Bigeminy

2855

CHB

Complete Heart

Block

843

EAR

Ectopic Atrial

Rhythm

2623

IVR

Idioventricular

Rhythm

1962

Page 9

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Train + Val

Test

Class

Description

Example

Patients

JUNCTIONAL

Junctional

Rhythm

2030

NOISE

Noise

9940

SINUS

Sinus Rhythm

22156

215

SVT

Supraventricular

Tachycardia

6301

TRIGEMINY

Ventricular

Trigeminy

2864

Ventricular

Tachycardia

4827

WENCKEBACH

Wenckebach

(Mobitz I)

2051

Table 2. A list of all of the rhythm types which the model classifies. For each rhythm we give the label name, a more descriptive name

and an example chosen from the training set. We also give the total number of patients with each rhythm for both the training and test

sets.