Linear
The fully-connected layer. Takes an input of shape [batch, infeatures], multiplies by a weight matrix, optionally adds a bias, and produces output of shape [batch, outfeatures].
Activation Layers
Activation layers are thin nn::* activation functions. They hold no parameters and no state — their only job is to sit in a Sequential and call the right autograd op when forward() is invoked.
BatchNorm
Batch Normalisation normalises a layer's inputs across the batch dimension, then rescales and shifts with learned parameters. The result is faster training, reduced sensitivity to initialisation, and a mild regularisation effect — which is a very good return on investment for two extra tensors.
Dropout
Dropout randomly zeroes out a fraction of activations during training. Each element is independently set to zero with probability p, and the surviving elements are scaled up by 1/(1-p) to keep the expected sum the same. During evaluation, dropout is a complete no-op — every neuron is active.