Improved kiwifruit detection using pre-trained VGG16 with RGB and NIR information fusion

Z Liu, J Wu, L Fu, Y Majeed, Y Feng, R Li, Y Cui�- IEEE access, 2019 - ieeexplore.ieee.org
Z Liu, J Wu, L Fu, Y Majeed, Y Feng, R Li, Y Cui
IEEE access, 2019ieeexplore.ieee.org
This study presents a novel method to apply the RGB-D (Red Green Blue-Depth) sensors
and fuse aligned RGB and NIR images with deep convolutional neural networks (CNN) for
fruit detection. It aims to build a more accurate, faster, and more reliable fruit detection
system, which is a vital element for fruit yield estimation and automated harvesting. Recent
work in deep neural networks has led to the development of a state-of-the-art object detector
termed Faster Region-based CNN (Faster R-CNN). A common Faster R-CNN network�…
This study presents a novel method to apply the RGB-D (Red Green Blue-Depth) sensors and fuse aligned RGB and NIR images with deep convolutional neural networks (CNN) for fruit detection. It aims to build a more accurate, faster, and more reliable fruit detection system, which is a vital element for fruit yield estimation and automated harvesting. Recent work in deep neural networks has led to the development of a state-of-the-art object detector termed Faster Region-based CNN (Faster R-CNN). A common Faster R-CNN network VGG16 was adopted through transfer learning, for the task of kiwifruit detection using imagery obtained from two modalities: RGB (red, green, blue) and Near-Infrared (NIR) images. Kinect v2 was used to take a bottom view of the kiwifruit canopy's NIR and RGB images. The NIR (1 channel) and RGB images (3 channels) were aligned and arranged side by side into a 6-channel image. The input layer of the VGG16 was modified to receive the 6-channel image. Two different fusion methods were used to extract features: Image-Fusion (fusion of the RGB and NIR images on input layer) and Feature-Fusion (fusion of feature maps of two VGG16 networks where the RGB and NIR images were input respectively). The improved networks were trained end-to-end using back-propagation and stochastic gradient descent techniques and compared to original VGG16 networks with RGB and NIR image input only. Results showed that the average precision (APs) of the original VGG16 with RGB and NIR image input only were 88.4% and 89.2% respectively, the 6-channel VGG16 using the Feature-Fusion method reached 90.5%, while that using the Image-Fusion method reached the highest AP of 90.7% and the fastest detection speed of 0.134 s/image. The results indicated that the proposed kiwifruit detection approach shows a potential for better fruit detection.
ieeexplore.ieee.org