One method to apply convolution is to use strides. Usually while performing convolution, if we want to find the value of the next element of a row towards the right direction, then we move the filter towards the right by one column. Similarly, if we want to find the value of an next element in a column, towards the bottom direction we move the filter in the bottom direction by one row.
In strided convolution instead of moving the filter by one column or row, we move it by s columns/rows
So for the [1, 1] element of the we place the filter’s top left corner with the image top left corner. Now for the next element of the resultant i.e. [1, 2], we shift the filter by 2 columns.
Similar for element [2, 1], we shift the filter by 2 rows
In general if we have a n x n image, a f x f filter, if we use padding p and stride s, then the resultant image has dimensions
If the above fraction is not an integer, we round it down. By convention, while performing strided convolution, if after moving the filter by either s columns or rows, if we find the filter having some of its part outside the image, we do not compute that part.
You might have seen the use of the word convolution to represent a different process, but in deep learning literature this is what is defined as convolution. To read more on this go to - Technical note on cross-correlation vs. convolution
Now we know how convolution is performed on two dimension images, let us move to three dimensional volumes (example. RGB images) in Convolutions over Volume