LeNet-5
Here we look at the architecture of the LeNet-5 neural network.
- We have our input image of size
32 x 32 x 1. The neural network was trained on black and white images so the number of channels is only one.LeNet-5was used to recognize handwritten digits. - The first convolutional layer is used with
6filters of size5 x 5and strides = 1. The output of this layer is a28 x 28 x 6image. - An average pooling layer is applied with
f = 2ands = 2. This gives an image of size14 x 14 x 6. - In the second convolutional layer we use a set of
16filters of size5 x 5ans strides = 1. The output is a image of size10 x 10 x 16. - Again an average pooling layer is applied with
f = 2ands = 2. This give an image of size5 x 5 x 16. - The next layer is a fully fully connected layer with
120neurons. So the400points images is densely connected to this layer. - The layer with
120neurons is then densely connected further to a layer with84neurons. This final layer with84features is then used to give the final output. - The output layer has
10neurons each corresponding to the probability of the image being a particular digit.
The original LeNet-5 has a non-linearity after pooling.
AlexNet
AlexNettakes input image of size227 x 227 x 3.- The first layer takes
96filters of size11 x 11withs = 4. This gives an output image of size55 x 55 x 96. - We apply max pooling with
f = 3ands = 2. This gives an image of size27 x 27 x 96 - We then apply same convolution with
256filters of size5 x 5to get an output image of size27 x 27 x 256. - Again we apply max pooling with
f = 3ands = 2to give an image of size13 x 13 x 256 - The third convolution layer is applied with
384filters of size3 x 3to get an output image of size13 x 13 x 384. - 4th convolution layer with
384filters of size3 x 3to give output of size13 x 13 x 384. - 5th convolution layer with
256filters of size3 x 3to give out put of size13 x 13 x 256. - Then max pooling with
f = 3ands = 2gives image of size6 x 6 x 256. - This layer is flattened into a vector having
9216nodes. This is then densely connected to a fully connected layer having4096nodes which again to another4096neuron layer which is then connected to a softmax layer with1000neurons.
ReLU activation function was used in this neural network.
VGG-16
In this neural network we will be using convolution layers with filters of size 3 x 3 and s = 1 with same convolution. All our max pooling layer would have f = 2 and s = 2.
- Input image of size
224 x 224 x 3is used. - Then we go through two conv layers with
64filters giving output of size224 x 224 x 64. - Then we use a pooling layer giving output of size
112 x 112 x 64. - Then two conv layers with
128filters giving output of size112 x 112 x 128. - Then again we use a pooling layer to give output of size
56 x 56 x 128. - Then we use 3 conv layers with
256filters, then a pooling layer giving output size28 x 28 x 256. - Then we use 3 conv layers with
512filters, then a pooling layer giving output of size14 x 14 x 512. - Then we use 3 conv layers with
512filters, then a pooling layer giving output of size7 x 7 x 512. - Then this goes to a
FClayer with4096neurons which is then again connected densely to a4096neuron layer which in the end is connected to a output softmax layer with1000neurons.