Loss Functions (`nn` layer)

The nn loss classes are the user-facing wrappers over the tensor-level tensor_*_loss functions. They inherit from LossFunction, hold a Reduction mode, and expose a single forward() method. They also work as callables via operator().

Header: include/nn/losses/loss.hpp

What they call

Each nn loss class's forward calls the corresponding autograd::*_loss function, which calls tensor_*_loss and wires up a backward node. The nn layer itself contains no mathematical logic — it's a configuration object that packages the right reduction mode with the right underlying operation.

`LossFunction` — The Base Class

class LossFunction {
public:
    Reduction reduction;

    virtual autograd::Variable *forward(Arena *arena,
                                        autograd::Variable *pred,
                                        autograd::Variable *target) = 0;

    autograd::Variable *operator()(Arena *arena,
                                   autograd::Variable *pred,
                                   autograd::Variable *target);
};

All loss classes inherit from LossFunction. The operator() overload lets you call a loss object like a function:

nn::MSELoss criterion;
auto* loss = criterion(graph_arena, pred, target);
// equivalent to:
auto* loss = criterion.forward(graph_arena, pred, target);

When to Use Each Loss

Task	Recommended class
Multi-class classification	`CrossEntropyLoss`
Binary classification (logit output)	`BCEWithLogitsLoss`
Binary classification (sigmoid output)	`BCELoss`
Regression, standard	`MSELoss`
Regression, outlier-robust	`HuberLoss`
Regression, heavy outliers	`L1Loss` / `MAELoss`
Metric learning (pairs)	`CosineEmbeddingLoss`
Metric learning (triplets)	`TripletLoss`
SVM-style binary	`HingeLoss`
Distribution matching	`KLDivLoss`
Weight regularisation	`L2Loss`

`nn::CrossEntropyLoss`

class CrossEntropyLoss : public LossFunction;
// Constructor: CrossEntropyLoss(Reduction red = REDUCTION_MEAN)

Combines log-softmax and negative log-likelihood into a single numerically stable operation. This is the loss to use for multi-class classification.

nn::CrossEntropyLoss criterion;
// pred:   [batch, num_classes]  — raw logits, no softmax applied
// target: [batch, num_classes]  — one-hot encoded
auto* loss = criterion(graph_arena, pred, target);

The LossType::CROSS_ENTROPY enum value in nn::Model::compile() uses this class.

Do not apply softmax before this loss. It applies softmax internally.

`nn::MSELoss`

class MSELoss : public LossFunction;
// Constructor: MSELoss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ (pred_i - target_i)²

The standard regression loss. Penalises large errors quadratically, making it sensitive to outliers.

nn::MSELoss criterion;
// pred:   [batch, 1]   — scalar regression output
// target: [batch, 1]   — ground-truth values
auto* loss = criterion(graph_arena, pred, target);

LossType::MSE in Model::compile().

`nn::L1Loss`

class L1Loss : public LossFunction;
// Constructor: L1Loss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ |pred_i - target_i|

Linear penalty. More robust to outliers than MSE.

`nn::MAELoss`

class MAELoss : public LossFunction;
// Constructor: MAELoss(Reduction red = REDUCTION_MEAN)

Mean Absolute Error. Identical to L1Loss — both call autograd::l1_loss internally. Two names for user convenience; use whichever is clearer in context.

`nn::HuberLoss`

class HuberLoss : public LossFunction;
// Constructor: HuberLoss(float delta = 1.0f, Reduction red = REDUCTION_MEAN)

Quadratic for small errors (|d| ≤ delta), linear for large errors. The best of MSE and L1. The California Housing tutorial uses this:

model.compile(nn::OptimizerType::ADAMW,
              nn::LossType::HUBER,      // ← HuberLoss(delta=1.0)
              0.001f, 200, 128);

LossType::HUBER in Model::compile() creates HuberLoss with delta = 1.0f.

For a custom delta, construct directly and use Trainer instead of Model:

nn::HuberLoss criterion(2.0f);   // delta = 2.0

`nn::BCELoss`

class BCELoss : public LossFunction;
// Constructor: BCELoss(Reduction red = REDUCTION_MEAN)

L = -(1/N) * Σ [t*log(p) + (1-t)*log(1-p)]

For binary classification where the model output has already been passed through sigmoid (i.e. pred ∈ (0, 1)).

// model output: sigmoid(logits)
nn::Sigmoid sig;
auto* prob = sig.forward(graph_arena, logits);
nn::BCELoss criterion;
auto* loss = criterion(graph_arena, prob, target);

LossType::BCE in Model::compile().

`nn::BCEWithLogitsLoss`

class BCEWithLogitsLoss : public LossFunction;
// Constructor: BCEWithLogitsLoss(Reduction red = REDUCTION_MEAN)

Same as BCELoss but accepts raw logits (not sigmoid-ed). Numerically more stable because it uses the log-sum-exp trick internally. Prefer this over BCELoss + separate sigmoid.

// model output: raw logits (no sigmoid)
nn::BCEWithLogitsLoss criterion;
auto* loss = criterion(graph_arena, logits, target);

LossType::BCE_WITH_LOGITS in Model::compile().

`nn::NLLLoss`

class NLLLoss : public LossFunction;
// Constructor: NLLLoss(Reduction red = REDUCTION_MEAN)

Negative log-likelihood. Expects pred to be log-probabilities (output of log(softmax(x))). CrossEntropyLoss is essentially NLLLoss(LogSoftmax(logits), target) combined.

`nn::KLDivLoss`

class KLDivLoss : public LossFunction;
// Constructor: KLDivLoss(Reduction red = REDUCTION_MEAN)

L = Σ target * (log(target) - pred)

Expects pred as log-probabilities. Use for knowledge distillation or training distributions (VAEs, diffusion models).

`nn::HingeLoss`

class HingeLoss : public LossFunction;
// Constructor: HingeLoss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ max(0, 1 - pred * target)

SVM-style loss. target must be ±1 (not 0/1).

`nn::L2Loss`

class L2Loss : public LossFunction;
// Constructor: L2Loss(Reduction red = REDUCTION_MEAN)

L = (1/N) * Σ 0.5 * w²

L2 weight regularisation as a loss term. Generally prefer AdamW's built-in weight_decay parameter for regularisation instead.

`nn::CosineEmbeddingLoss`

class CosineEmbeddingLoss : public LossFunction;
// Constructor: CosineEmbeddingLoss(float margin = 0.0f,
//                                  Reduction red = REDUCTION_MEAN)

For learning embeddings where pairs of inputs should be similar (target=1) or dissimilar (target=-1). Use the forward_triplet method, not forward:

nn::CosineEmbeddingLoss criterion(0.5f);   // margin = 0.5
auto* loss = criterion.forward_triplet(graph_arena, x1, x2, target);

Calling forward (the base class override) prints a warning and returns nullptr.

`nn::TripletLoss`

class TripletLoss : public LossFunction;
// Constructor: TripletLoss(float margin = 1.0f,
//                          Reduction red = REDUCTION_MEAN)

L = max(0, dist(anchor, pos) - dist(anchor, neg) + margin)

For metric learning with triplets of (anchor, positive, negative) samples. Use forward_triplet:

nn::TripletLoss criterion(1.0f);
auto* loss = criterion.forward_triplet(graph_arena, anchor, positive, negative);

Using Loss Functions Directly (Outside `Model`)

If you're using Trainer directly or writing a custom training loop:

nn::CrossEntropyLoss criterion;

// In your training loop:
auto* pred = model_seq->forward(graph_arena, x);
auto* loss = criterion.forward(graph_arena, pred, y);

float loss_val = loss->data->storage->data[loss->data->offset];

optimizer.zero_grad();
autograd::backward(graph_arena, loss);
optimizer.step(graph_arena);

For the high-level path, see Model and Trainer.

LossFunction — The Base Class​

When to Use Each Loss​

nn::CrossEntropyLoss​

nn::MSELoss​

nn::L1Loss​

nn::MAELoss​

nn::HuberLoss​

nn::BCELoss​

nn::BCEWithLogitsLoss​

nn::NLLLoss​

nn::KLDivLoss​

nn::HingeLoss​

nn::L2Loss​

nn::CosineEmbeddingLoss​

nn::TripletLoss​

Using Loss Functions Directly (Outside Model)​