Skip to main content

nn::Trainer<OptimizerType, LossType>

Trainer is the training loop engine. It owns no model weights — it receives a Module*, an optimizer, a loss function, and a graph arena, then runs the forward → loss → backward → step cycle for as many epochs as you ask for.

nn::Model::train() is a wrapper around Trainer. You only need to use Trainer directly when you want a combination of optimizer and loss that Model doesn't expose, or when you need full control over the training loop.

Header: include/nn/training/trainer.hpp
Namespace: gradientcore::nn

What it calls

On every batch, Trainer calls module->forward(graph_arena, x)criterion->forward(graph_arena, pred, y)autograd::backward(graph_arena, loss)optimizer.step(graph_arena)graph_arena->pop_to(saved_pos). The graph arena is rewound after every batch, so all intermediate tensors and graph nodes are freed automatically.


Class Template

template <typename OptimizerType, typename LossType>
class Trainer {
public:
Trainer(Module *model,
OptimizerType *optimizer,
LossType *criterion,
Arena *graph_arena);

void set_verbose(bool v);

bool validate_data(const std::vector<std::vector<float>> &X,
const std::vector<std::vector<float>> &Y);

// Train from raw vectors
TrainingStats fit(const std::vector<std::vector<float>> &X_train,
const std::vector<std::vector<float>> &Y_train,
uint32_t epochs,
uint32_t batch_size = 32,
uint32_t log_interval = 100);

// Train from a DataLoader
TrainingStats fit_dataloader(data::DataLoader *dataloader,
uint32_t epochs,
uint32_t log_interval = 100);

// Evaluate from raw vectors
float evaluate(const std::vector<std::vector<float>> &X_test,
const std::vector<std::vector<float>> &Y_test);

// Evaluate from a DataLoader
float evaluate_dataloader(data::DataLoader *dataloader);
};

OptimizerType must have zero_grad() and step(Arena*) methods (all optimizers in gradientcore::optim satisfy this). LossType must have a forward(Arena*, Variable*, Variable*) method (all nn::LossFunction subclasses satisfy this).


Constructor

Trainer<optim::Adam, nn::CrossEntropyLoss> trainer(
model_seq, // nn::Module* — the Sequential or any Module
&adam_opt, // OptimizerType* — already constructed with params
&cross_entropy, // LossType*
graph_arena // Arena* — the graph arena
);

The constructor prints errors to stderr if any pointer is null, but does not abort. A null pointer will cause a crash when fit is called — construct your objects before creating the Trainer.


set_verbose(bool)

trainer.set_verbose(true); // print epoch loss (default: true)
trainer.set_verbose(false); // silent training

When verbose = true, the trainer prints a configuration summary before training starts and logs loss at the epoch interval set by log_interval.


validate_data

bool ok = trainer.validate_data(X, Y);

Checks that:

  • X and Y are non-empty.
  • X.size() == Y.size() (same number of samples).
  • All rows in X have the same number of columns.
  • All rows in Y have the same number of columns.

Called automatically by fit(). You can call it manually to validate data before constructing the trainer.


fit — Train from Raw Vectors

TrainingStats fit(const std::vector<std::vector<float>> &X_train,
const std::vector<std::vector<float>> &Y_train,
uint32_t epochs,
uint32_t batch_size = 32,
uint32_t log_interval = 100);
ParameterDescription
X_trainInput features: [num_samples][num_features]
Y_trainTargets: [num_samples][output_dim]
epochsNumber of full passes over the data
batch_sizeSamples per gradient update step
log_intervalPrint loss every N epochs (and always on the last epoch)

What happens per batch

1. Save graph arena position: pos = graph_arena->get_pos()
2. Copy batch into tensors: t_x[batch_size × input_dim],
t_y[batch_size × output_dim]
3. Wrap as Variables: x = create_leaf(graph_arena, t_x, false)
y = create_leaf(graph_arena, t_y, false)
4. Forward pass: pred = model->forward(graph_arena, x)
5. Loss: loss = criterion->forward(graph_arena, pred, y)
6. Read scalar loss value
7. Zero gradients: optimizer->zero_grad()
8. Backward: autograd::backward(graph_arena, loss)
9. Update weights: optimizer->step(graph_arena)
10. Free batch graph: graph_arena->pop_to(pos)

After all epochs complete, model->eval() is called automatically.

Returns

A TrainingStats struct. If training aborts early (null forward/loss output), epochs_trained will reflect how many epochs actually completed. See TrainingStats.

Example

// Construct optimizer and loss
auto params = seq->parameters();
optim::Adam adam(perm_arena, params, /*lr=*/0.001f);
nn::CrossEntropyLoss criterion;

// Construct trainer
Trainer<optim::Adam, nn::CrossEntropyLoss> trainer(
seq, &adam, &criterion, graph_arena);

// Train
TrainingStats stats = trainer.fit(X_train, Y_train, /*epochs=*/40,
/*batch_size=*/64, /*log_interval=*/5);

fit_dataloader — Train from a DataLoader

TrainingStats fit_dataloader(data::DataLoader *dataloader,
uint32_t epochs,
uint32_t log_interval = 100);

Identical to fit but accepts a pre-built DataLoader instead of raw vectors. The dataloader handles batching and shuffling. reset(true) is called on the dataloader at the start of each epoch.

This is the method nn::Model::train() uses internally.

auto* features_ds = Dataset::create_2d(perm, X_train);
auto* labels_ds = Dataset::create_2d(perm, Y_train);
auto* loader = DataLoader::create(features_ds, labels_ds, 64, /*shuffle=*/true);

TrainingStats stats = trainer.fit_dataloader(loader, /*epochs=*/40);

Prefer fit_dataloader over fit when your dataset is large — DataLoader only copies one batch at a time into the graph arena, whereas fit constructs its own batch tensors each step from the raw vectors.


evaluate — Evaluate from Raw Vectors

float evaluate(const std::vector<std::vector<float>> &X_test,
const std::vector<std::vector<float>> &Y_test);

Runs the model in eval mode over the test data and returns the mean loss. Uses an internal batch size of min(32, num_samples).

  • Calls model->eval() at the start.
  • Does not call optimizer.step() or backward.
  • Frees each batch from the graph arena with pop_to.

Returns -1.0f on data validation failure.

float test_loss = trainer.evaluate(X_test, Y_test);
std::cout << "Test loss: " << test_loss << "\n";

evaluate_dataloader — Evaluate from a DataLoader

float evaluate_dataloader(data::DataLoader *dataloader);

Same as evaluate but consumes batches from a DataLoader. Calls dataloader->reset(false) first (no reshuffling for evaluation — you want a deterministic order).

float test_loss = trainer.evaluate_dataloader(test_loader);

Returns -1.0f if the dataloader is null or returns invalid batches.


Verbose Output

When verbose = true (the default), fit prints:

=== Training Configuration ===
Epochs: 40
Batch Size: 64
Samples: 60000
Input Features: 784
Output Features: 10
Model Parameters: 101770
==============================

Epoch [1/40] | Loss: 2.301842
Epoch [5/40] | Loss: 0.483201
...
Epoch [40/40] | Loss: 0.082341
Training complete! Final Loss: 0.082341

fit_dataloader prints the same but also shows batches per epoch and the feature shape.


Using Trainer vs Model

nn::Modelnn::Trainer
InterfaceKeras-styleExplicit
Optimizer/loss combosFixed set of enumsAny template combination
Custom training logicNot possibleFull control
Recommended forStandard experimentsResearch / custom loops

For anything beyond the combinations in LossType and OptimizerType enums, use Trainer directly. For example, to use optim::RMSprop with nn::HuberLoss(delta=2.0f):

optim::RMSprop rms(perm, seq->parameters(), 0.01f);
nn::HuberLoss huber(2.0f);
Trainer<optim::RMSprop, nn::HuberLoss> trainer(seq, &rms, &huber, graph);
TrainingStats stats = trainer.fit_dataloader(loader, 100);

Model::train() only supports the specific optimizer/loss pairings listed in its compile() documentation. Trainer supports any combination.