Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May;35(5):1285-98.
doi: 10.1109/TMI.2016.2528162. Epub 2016 Feb 11.

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin et al. IEEE Trans Med Imaging. 2016 May.

Abstract

Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and deep convolutional neural networks (CNNs). CNNs enable learning data-driven, highly representative, hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Some examples of abdominal and mediastinal lymph nodes sampled on axial (ax), coronal (co), and sagittal (sa) views, with four different fields-of-views (30 mm: orange; 45 mm: red; 85 mm: green; 128 mm: blue) surrounding lymph nodes.
Fig. 2.
Fig. 2.
Some examples of CT image slices with six lung tissue types in the ILD dataset . Disease tissue types are located with dark orange arrows. (a): healthy; (b): emphysema; (c): ground glass; (d): fibrosis; (e): micronodules; (f): consolidation.
Fig. 3.
Fig. 3.
Some examples of formula image CT image patches for (a) Nm, (b) Em, (c) Gg, (d) Fb, (e) MN (f) CD.
Fig. 4.
Fig. 4.
An example of lung/high-attenuation/low-attenuation CT windowing for an axis lung CT slice. We encode the lung/high-attenuation/low-attenuation CT windowing into red/green/blue channels.
Fig. 5.
Fig. 5.
A simplified illustration of the CNN architectures used. Googlenet contains two convolution layers, three pooling layers, and nine inception layers. Each of the inception layer of googlenet consists of six convolution layers and one pooling layer.
Fig. 6.
Fig. 6.
Illustration of formula image layer of googlenet. Inception layers of googlenet consist of six convolution layers with different kernel sizes and one pooling layer.
Fig. 7.
Fig. 7.
Some examples of cifar10 dataset and some images of “tennis ball” class from imagenet dataset. Images of cifar10 dataset are small (32×32) images with object of the image class category in the center. Images of imagenet dataset are larger (256×256), where object of the image class category can be small, obscure, partial, and sometimes in a cluttered environment.
Fig. 8.
Fig. 8.
FROC curves averaged on three-fold CV for the abdominal (left) and mediastinal (right) lymph nodes using different CNN models.
Fig. 9.
Fig. 9.
Examples of misclassified lymph nodes (in axial view) of both false negatives (left) and false positives (right). Mediastinal LN examples are shown in the upper row, and abdominal LN examples in the bottom row.
Fig. 10.
Fig. 10.
Visual examples of misclassified ILD 64×64 patches (in axial view), with their ground truth labels and inaccurately classified labels.
Fig. 11.
Fig. 11.
Traces of training and validation loss (blue and green lines) and validation accuracy (orange lines) during (a) training alexnet from random initialization and (b) fine-tuning from imagenet pre-trained cnn, for ILD classification.
Fig. 12.
Fig. 12.
Visualization of first layer convolution filters of CNNs trained on abdominal and mediastinal LNs in RGB color, from random initialization (alexnet-RI (256×256), alexnet-RI (64×64), googlenet-RI (256×256) and googlenet-RI (64×64)) and with transfer learning (alexnet-TL (256×256)).
Fig. 13.
Fig. 13.
Visualization of the last pooling layer (pool-5) activations (top). Pooling units where the relative image location of the disease region is located in the image are highlighted with green boxes. The original images reconstructed from the units are shown in the bottom . The examples in (a) and (b) are computed from the input ILD images in Figs. 2(b) and 2(c), respectively.

Similar articles

Cited by

References

    1. Deng J., et al. , “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
    1. Russakovsky O., et al. , “ImageNet large scale visual recognition challenge,” ArXiv:1409.0575, 2014.
    1. LeCun Y., Bottou L., Bengio Y., and Haffner P., “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
    1. Krizhevsky A., Sutskever I., and Hinton G. E., “ImageNet classification with deep convolutional neural networks,” Proc. NIPS, pp. 1097–1105, 2012.
    1. Krizhevsky A., Learning multiple layers of features from tiny images, M.S. thesisDept. Comp. Sci.Univ. Toronto, Toronto, Canada: 2009.

MeSH terms