Restructuring batch normalization to accelerate CNN training

W Jung, D Jung, B Kim, S Lee…�- …�of Machine Learning�…, 2019 - proceedings.mlsys.org
W Jung, D Jung, B Kim, S Lee, W Rhee, JH Ahn
Proceedings of Machine Learning and Systems, 2019proceedings.mlsys.org
Batch Normalization (BN) has become a core design block of modern Convolutional Neural
Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and
deep architecture. BN requires mean and variance calculations over each mini-batch during
training. Therefore, the existing memory access reduction techniques, such as fusing
multiple CONV layers, are not effective for accelerating BN due to their inability to optimize
mini-batch related calculations during training. To address this increasingly important�…
Abstract
Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory access reduction techniques, such as fusing multiple CONV layers, are not effective for accelerating BN due to their inability to optimize mini-batch related calculations during training. To address this increasingly important problem, we propose to restructure BN layers by first splitting a BN layer into two sub-layers (fission) and then combining the first sub-layer with its preceding CONV layer and the second sub-layer with the following activation and CONV layers (fusion). The proposed solution can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor show that the proposed BN restructuring can improve the performance of DenseNet-121 by 25.7%.
proceedings.mlsys.org