Skip to main content

Reduction Modes

Reduction modes control whether a loss function returns a per-element tensor or collapses to a scalar. Every loss function in GradCore-Tensor accepts a Reduction argument.

The Enum

enum Reduction {
REDUCTION_NONE, // Return a tensor with the same shape as the input
REDUCTION_MEAN, // Return a scalar: average of all losses
REDUCTION_SUM, // Return a scalar: sum of all losses
};

Behaviour by Mode

REDUCTION_NONE

The output tensor has the same shape as the input predictions. Each element gets its own loss value — useful for custom per-sample weighting or analysis.

// pred: shape [4, 1]
// target: shape [4, 1]
// out: shape [4, 1] ← one loss value per sample
Tensor *out = tensor_create(arena, 2, pred->shape);
tensor_mse_loss(out, pred, target, REDUCTION_NONE);

REDUCTION_MEAN

The output is a scalar (size = 1). The formula is:

L = (1/N) * Σ L_i

where N is the total number of elements in the prediction tensor. This normalises by the batch size and any other dimensions — the loss magnitude is independent of batch size.

Use this for training. It makes hyperparameters (learning rate, etc.) transferable across different batch sizes.

REDUCTION_SUM

The output is a scalar. No normalisation:

L = Σ L_i

The loss grows with batch size. Occasionally useful for specific mathematical formulations but rarely the right choice for training loops.

Effect on Gradients

The reduction mode scales the backward pass gradients:

float scale = 1.0f;
if (reduction == REDUCTION_MEAN) {
scale = 1.0f / static_cast<float>(pred->size);
}
// gradient = scale * local_gradient

For REDUCTION_NONE, each gradient flows back individually.

For REDUCTION_MEAN, every gradient is divided by N — which keeps gradient magnitudes consistent regardless of how many elements contribute to the loss.

In the nn API

The high-level Model API always uses REDUCTION_MEAN. The enum is exposed at the tensor level for users who call the loss functions directly (e.g. in a custom training loop that needs per-sample losses for sample weighting or curriculum learning).

Cross-Entropy: What is N?

For tensor_cross_entropy_loss, N is num_batches (number of samples), not num_batches * num_classes. The loss for each sample is already summed over classes before the batch reduction:

L_batch = -(1/N) * Σ_batch [ Σ_class target * log(softmax(logits)) ]

This matches PyTorch's nn.CrossEntropyLoss behaviour.