Classic Networks

`LeNet-5`

Here we look at the architecture of the LeNet-5 neural network.

We have our input image of size 32 x 32 x 1. The neural network was trained on black and white images so the number of channels is only one. LeNet-5 was used to recognize handwritten digits.
The first convolutional layer is used with 6 filters of size 5 x 5 and stride s = 1. The output of this layer is a 28 x 28 x 6 image.
An average pooling layer is applied with f = 2 and s = 2. This gives an image of size 14 x 14 x 6.
In the second convolutional layer we use a set of 16 filters of size 5 x 5 ans stride s = 1. The output is a image of size 10 x 10 x 16.
Again an average pooling layer is applied with f = 2 and s = 2. This give an image of size 5 x 5 x 16.
The next layer is a fully fully connected layer with 120 neurons. So the 400 points images is densely connected to this layer.
The layer with 120 neurons is then densely connected further to a layer with 84 neurons. This final layer with 84 features is then used to give the final output.
The output layer has 10 neurons each corresponding to the probability of the image being a particular digit.

The original LeNet-5 has a non-linearity after pooling.

`AlexNet`

AlexNet takes input image of size 227 x 227 x 3.
The first layer takes 96 filters of size 11 x 11 with s = 4. This gives an output image of size 55 x 55 x 96.
We apply max pooling with f = 3 and s = 2. This gives an image of size 27 x 27 x 96
We then apply same convolution with 256 filters of size 5 x 5 to get an output image of size 27 x 27 x 256.
Again we apply max pooling with f = 3 and s = 2 to give an image of size 13 x 13 x 256
The third convolution layer is applied with 384 filters of size 3 x 3 to get an output image of size 13 x 13 x 384.
4th convolution layer with 384 filters of size 3 x 3 to give output of size 13 x 13 x 384.
5th convolution layer with 256 filters of size 3 x 3 to give out put of size 13 x 13 x 256.
Then max pooling with f = 3 and s = 2 gives image of size 6 x 6 x 256.
This layer is flattened into a vector having 9216 nodes. This is then densely connected to a fully connected layer having 4096 nodes which again to another 4096 neuron layer which is then connected to a softmax layer with 1000 neurons.

ReLU activation function was used in this neural network.

`VGG-16`

In this neural network we will be using convolution layers with filters of size 3 x 3 and s = 1 with same convolution. All our max pooling layer would have f = 2 and s = 2.

Input image of size 224 x 224 x 3 is used.
Then we go through two conv layers with 64 filters giving output of size 224 x 224 x 64.
Then we use a pooling layer giving output of size 112 x 112 x 64.
Then two conv layers with 128 filters giving output of size 112 x 112 x 128.
Then again we use a pooling layer to give output of size 56 x 56 x 128.
Then we use 3 conv layers with 256 filters, then a pooling layer giving output size 28 x 28 x 256.
Then we use 3 conv layers with 512 filters, then a pooling layer giving output of size 14 x 14 x 512.
Then we use 3 conv layers with 512 filters, then a pooling layer giving output of size 7 x 7 x 512.
Then this goes to a FC layer with 4096 neurons which is then again connected densely to a 4096 neuron layer which in the end is connected to a output softmax layer with 1000 neurons.

Digital Garden

Explorer

Classic Networks

`LeNet-5`

`AlexNet`

`VGG-16`

Graph View

Table of Contents

Backlinks