Convolution Neural Network

Convolution Neural Network

I have planned three articles series to demonstrate the use of Transfer learning in activity detection through image processing. In this first article, I will describe the basic concept of convolution process followed by a second article on ConvNet architecture, Design parameters. I will use Keras library to create and training the ConvNet for object detection using Transfer learning in the third article.

What is Convolution

Convolution is defined mathematically as a function that performs integration of the product of two functions with one of the signal flipped. In machine learning, Feature extraction is one of the important and critical steps for several deep learning algorithms. Researchers have demonstrated the used of the convolution to extract low dimensional features from input data. A convolution process employs the concept of the sliding window in a form of a square matrix aka filter with learnable weights, and slide it over the input data to produce a weighted sum (of weights and input) as the output. The weighted sum is the feature space which is used as the input for the next layers.

Convolution is of several types such as 1D convolution which is more generally used on sequence data for extracting local 1D subsequences. One good usage is on NLP where every sentence is represented as a sequence of words. Image Data set however is represented through 2D data and hence best suit for 2D convolution operation. In a 2D convolution, the convolution filter moves in 2-directions (x,y) to calculate low dimensional features from the image data, producing a 2-dimensional matrix on output. In this article, we will focus on 2D and 3D convolution process for constructing a convolution neural network.

2D Convolution of Input Signal

The convolution process as shown in Figure employs an appropriate filter aka kernel over an input matrix of either image or text. The kernel is an (n*n) matrix consisting of certain numbers also called as filter size. The value of these numbers are termed as training parameters and are determined through training process such as Gradient-based backpropagation. The convolution process performs element-wise multiplication operation on the input matrix and kernel while sliding the filter or kernel horizontally or vertically over the entire given input matrix. the step size of the kernel while traversing the image is termed as stride and its default value is 1, however, a value of 2 is also used to downsize the input image. In the figure above, a 5x5 input image is convoluted through sliding a 3x3 filter.

This can be explained by a simple example. Assume the input image is of size (5,5) and the filter f is of size (3,3). For element-wise multiplication, the kernel is placed on the receptive fields of the input image, in this case first 3x3 elements of the input image. Thus each number in the filter is multiplied with the corresponding 9 pixels on the top-left of the input image and are summed up to give a single pixel value which is then assigned on the top-left of output layer as shown in Figure 2. This process is then repeated by shifting the kernel one column i.e stride =1 and perform the element by element multiplication again as shown in the figure.   The process of convolution is continued until the kernel is swapped on a full input matrix. The output dimension of the convoluted image can be calculated from the following formula  (n+2p-f)/s +1 X (n+2p-f)/s+1. Where n is height and width of the input image, f is the filter size and p is called as padding and s is the stride.

 Size of the Convoluted Image

The output of this convolution process can be calculated from the following formula  (n+2p-f)/s +1 X (n+2p-f)/s+1 ; Where n is height and width of the input image, f is the filter size and p is called as padding and s is stride. It should be noted that the convoluted image is smaller in size than the input image. It is due to the subtraction factor of filter size f from the height and width of the input image. This means the higher size of the filter will produce a smaller convoluted image.

Padding and Stride of 2D Convolution Process

During the convolution process, at each step, the position of the filter window is updated according to the strides argument. One of the drawbacks is that the corner columns in the input matrix are only used once while the middle columns are used very often during each of the convolutions thus this means that a lot of information on the corners of the image is wasted or thrown away. Therefore, in order to compute the values of those border regions input may be extended by padding with zero values. In some cases, we may want to discard these border regions. Hence, no padding is required. In other words, padding is achieved by adding additional rows and columns of zeros at the top, bottom, left and right of the input matrix. Thus an additional one pixels from top to bottom and left to right would leave a nxn image to n+2p, n+2p image where p is the number of added pixels i.e. padding. Thus 5x5 by the image will be extended to 7x7 image. The process of padding is shown in Figure 3.

Valid and Same Padding 

When no padding is applied the output image is shrunk due to p=0 in output image dimension n +2p- f +1. No padding is also called as Valid which means that the input image after the convolution process will be smaller as no padding is added We can, however, calculate the value of padding to result in the same size image as input, in which case it is termed as “SAME” padding, the formula used to calculate the p for the “SAME” pooling is (n+2p-f)/s+1 =n => 2p=(n-1)*s+f-n for example for a 7x7 image with filter size =3 stride =2 the value of p should be 4 in order to get the 7x7 image after convolution.

Convolution of 3D Images

Now let's talk about 3D convolution. the basic principle is still the same however convolution of a 3D image having RGB channels require the number of the channel must match with that of the filter. This 3D convolution would produce a 2D convoluted image having reduced dimension as (n+2p-f)/s+1 x(n+2p-f)/s as shown in animated figure.

Let a 9x9x3 image is convoluted with a 3x3x3 filter. note that both have same number of the channel in this case 3. For 3D convolution each channel of the filter would convoluted with the corresponding channel of the input image. thus for 3x3x3 filter each of the 27 element would be multiplied with the corresponding 27 element of the image and sum of this product would be first number in the output image as shown in the Figure . This completes first round.This process is then repeated by shifting the filter one column i.e stride =1 and perform the element by element multiplication again as shown in the figure.   The process of convolution is continued until the kernel is swapped on a full input matrix. this results one 2D convoluted image. Now if the input image is convoluted with another 3x3x3 filter it will resulted another 2D convoluted image. thus as the number of the filter increases the channel in the output also increases.

The out image actually give us features. Different choice of these filters thus will give different type of feature vectors,for example if we use vertical filters it would be giving us vertical edges as features and for horizontal filters the features would be horizontal feature vector.We can detect edges in one channel i.e. either of RGB or in all channels . for example if we want to find edges only in RED channel say vertical edges then we use vertical filter for Red channels and set all other filters to zero thus it does not calculate the edges in other channels.

That's all about Convolution , in the next article i will describe how to use this concept to create a CNN network.


Sai Ch

DevSecOps & SRE Specialist at Asda | Cloud Security, Automation, and Site Reliability Engineering | Implemented Security Measures for Cost Efficiency | Open to Outside IR35 contracts & permanent roles

2y

well said sir nice topic

Like
Reply

interesting

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics