Layers | GradCore-Tensor

📄️Linear

The fully-connected layer. Takes an input of shape [batch, infeatures], multiplies by a weight matrix, optionally adds a bias, and produces output of shape [batch, outfeatures].

📄️Activation Layers

Activation layers are thin nn::* activation functions. They hold no parameters and no state — their only job is to sit in a Sequential and call the right autograd op when forward() is invoked.

Batch Normalisation normalises a layer's inputs across the batch dimension, then rescales and shifts with learned parameters. The result is faster training, reduced sensitivity to initialisation, and a mild regularisation effect — which is a very good return on investment for two extra tensors.

📄️Dropout

Dropout randomly zeroes out a fraction of activations during training. Each element is independently set to zero with probability p, and the surviving elements are scaled up by 1/(1-p) to keep the expected sum the same. During evaluation, dropout is a complete no-op — every neuron is active.

📄️Linear

📄️Activation Layers

📄️BatchNorm

📄️Dropout