Skip to main content

Loss Functions (nn layer)

The nn loss classes are the user-facing wrappers over the tensor-level tensor_*_loss functions. They inherit from LossFunction, hold a Reduction mode, and expose a single forward() method. They also work as callables via operator().

Header: include/nn/losses/loss.hpp

What they call

Each nn loss class's forward calls the corresponding autograd::*_loss function, which calls tensor_*_loss and wires up a backward node. The nn layer itself contains no mathematical logic — it's a configuration object that packages the right reduction mode with the right underlying operation.


LossFunction — The Base Class

class LossFunction {
public:
Reduction reduction;

virtual autograd::Variable *forward(Arena *arena,
autograd::Variable *pred,
autograd::Variable *target) = 0;

autograd::Variable *operator()(Arena *arena,
autograd::Variable *pred,
autograd::Variable *target);
};

All loss classes inherit from LossFunction. The operator() overload lets you call a loss object like a function:

nn::MSELoss criterion;
auto* loss = criterion(graph_arena, pred, target);
// equivalent to:
auto* loss = criterion.forward(graph_arena, pred, target);

When to Use Each Loss

TaskRecommended class
Multi-class classificationCrossEntropyLoss
Binary classification (logit output)BCEWithLogitsLoss
Binary classification (sigmoid output)BCELoss
Regression, standardMSELoss
Regression, outlier-robustHuberLoss
Regression, heavy outliersL1Loss / MAELoss
Metric learning (pairs)CosineEmbeddingLoss
Metric learning (triplets)TripletLoss
SVM-style binaryHingeLoss
Distribution matchingKLDivLoss
Weight regularisationL2Loss

nn::CrossEntropyLoss

class CrossEntropyLoss : public LossFunction;
// Constructor: CrossEntropyLoss(Reduction red = REDUCTION_MEAN)

Combines log-softmax and negative log-likelihood into a single numerically stable operation. This is the loss to use for multi-class classification.

nn::CrossEntropyLoss criterion;
// pred: [batch, num_classes] — raw logits, no softmax applied
// target: [batch, num_classes] — one-hot encoded
auto* loss = criterion(graph_arena, pred, target);

The LossType::CROSS_ENTROPY enum value in nn::Model::compile() uses this class.

Do not apply softmax before this loss. It applies softmax internally.


nn::MSELoss

class MSELoss : public LossFunction;
// Constructor: MSELoss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ (pred_i - target_i)²

The standard regression loss. Penalises large errors quadratically, making it sensitive to outliers.

nn::MSELoss criterion;
// pred: [batch, 1] — scalar regression output
// target: [batch, 1] — ground-truth values
auto* loss = criterion(graph_arena, pred, target);

LossType::MSE in Model::compile().


nn::L1Loss

class L1Loss : public LossFunction;
// Constructor: L1Loss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ |pred_i - target_i|

Linear penalty. More robust to outliers than MSE.


nn::MAELoss

class MAELoss : public LossFunction;
// Constructor: MAELoss(Reduction red = REDUCTION_MEAN)

Mean Absolute Error. Identical to L1Loss — both call autograd::l1_loss internally. Two names for user convenience; use whichever is clearer in context.


nn::HuberLoss

class HuberLoss : public LossFunction;
// Constructor: HuberLoss(float delta = 1.0f, Reduction red = REDUCTION_MEAN)

Quadratic for small errors (|d| ≤ delta), linear for large errors. The best of MSE and L1. The California Housing tutorial uses this:

model.compile(nn::OptimizerType::ADAMW,
nn::LossType::HUBER, // ← HuberLoss(delta=1.0)
0.001f, 200, 128);

LossType::HUBER in Model::compile() creates HuberLoss with delta = 1.0f.

For a custom delta, construct directly and use Trainer instead of Model:

nn::HuberLoss criterion(2.0f); // delta = 2.0

nn::BCELoss

class BCELoss : public LossFunction;
// Constructor: BCELoss(Reduction red = REDUCTION_MEAN)

L = -(1/N) * Σ [t*log(p) + (1-t)*log(1-p)]

For binary classification where the model output has already been passed through sigmoid (i.e. pred ∈ (0, 1)).

// model output: sigmoid(logits)
nn::Sigmoid sig;
auto* prob = sig.forward(graph_arena, logits);
nn::BCELoss criterion;
auto* loss = criterion(graph_arena, prob, target);

LossType::BCE in Model::compile().


nn::BCEWithLogitsLoss

class BCEWithLogitsLoss : public LossFunction;
// Constructor: BCEWithLogitsLoss(Reduction red = REDUCTION_MEAN)

Same as BCELoss but accepts raw logits (not sigmoid-ed). Numerically more stable because it uses the log-sum-exp trick internally. Prefer this over BCELoss + separate sigmoid.

// model output: raw logits (no sigmoid)
nn::BCEWithLogitsLoss criterion;
auto* loss = criterion(graph_arena, logits, target);

LossType::BCE_WITH_LOGITS in Model::compile().


nn::NLLLoss

class NLLLoss : public LossFunction;
// Constructor: NLLLoss(Reduction red = REDUCTION_MEAN)

Negative log-likelihood. Expects pred to be log-probabilities (output of log(softmax(x))). CrossEntropyLoss is essentially NLLLoss(LogSoftmax(logits), target) combined.


nn::KLDivLoss

class KLDivLoss : public LossFunction;
// Constructor: KLDivLoss(Reduction red = REDUCTION_MEAN)

L = Σ target * (log(target) - pred)

Expects pred as log-probabilities. Use for knowledge distillation or training distributions (VAEs, diffusion models).


nn::HingeLoss

class HingeLoss : public LossFunction;
// Constructor: HingeLoss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ max(0, 1 - pred * target)

SVM-style loss. target must be ±1 (not 0/1).


nn::L2Loss

class L2Loss : public LossFunction;
// Constructor: L2Loss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ 0.5 * w²

L2 weight regularisation as a loss term. Generally prefer AdamW's built-in weight_decay parameter for regularisation instead.


nn::CosineEmbeddingLoss

class CosineEmbeddingLoss : public LossFunction;
// Constructor: CosineEmbeddingLoss(float margin = 0.0f,
// Reduction red = REDUCTION_MEAN)

For learning embeddings where pairs of inputs should be similar (target=1) or dissimilar (target=-1). Use the forward_triplet method, not forward:

nn::CosineEmbeddingLoss criterion(0.5f); // margin = 0.5
auto* loss = criterion.forward_triplet(graph_arena, x1, x2, target);

Calling forward (the base class override) prints a warning and returns nullptr.


nn::TripletLoss

class TripletLoss : public LossFunction;
// Constructor: TripletLoss(float margin = 1.0f,
// Reduction red = REDUCTION_MEAN)

L = max(0, dist(anchor, pos) - dist(anchor, neg) + margin)

For metric learning with triplets of (anchor, positive, negative) samples. Use forward_triplet:

nn::TripletLoss criterion(1.0f);
auto* loss = criterion.forward_triplet(graph_arena, anchor, positive, negative);

Using Loss Functions Directly (Outside Model)

If you're using Trainer directly or writing a custom training loop:

nn::CrossEntropyLoss criterion;

// In your training loop:
auto* pred = model_seq->forward(graph_arena, x);
auto* loss = criterion.forward(graph_arena, pred, y);

float loss_val = loss->data->storage->data[loss->data->offset];

optimizer.zero_grad();
autograd::backward(graph_arena, loss);
optimizer.step(graph_arena);

For the high-level path, see Model and Trainer.