nn::Module
Module is the abstract base class for every layer, activation, loss, and container in GradCore-Tensor. If you want to write a custom layer, you subclass Module. If you want to understand how a layer works internally, you understand Module first.
Module::forward is expected to call autograd::* operations, which in turn call tensor_* functions. Module itself does not touch raw tensors directly — it delegates to autograd::Variable ops so that the computation graph is built automatically for backpropagation.
Header: include/nn/core/module.hpp
Class Declaration (simplified)
namespace gradientcore::nn {
class Module {
protected:
std::vector<autograd::Variable *> _parameters;
std::vector<Module *> _modules;
bool _training;
public:
Module();
virtual ~Module() = default;
// Training / eval mode
virtual void train(bool mode = true);
void eval();
bool is_training() const;
// Registration
void register_parameter(autograd::Variable *param);
void register_module(Module *module);
void register_forward_hook(ForwardHook hook);
// Parameter access
virtual std::vector<autograd::Variable *> parameters();
virtual std::map<std::string, autograd::Variable *> named_parameters();
virtual uint64_t num_parameters();
virtual uint64_t num_trainable_parameters();
// Persistence
bool save(const std::string &path, const std::string &format = "binary") const;
bool load(const std::string &path, Arena *arena);
// Summary
virtual void summary();
// The one method every subclass must implement
virtual autograd::Variable *forward(Arena *compute_arena,
autograd::Variable *x) = 0;
// Call operator — runs forward + hooks
autograd::Variable *operator()(Arena *compute_arena, autograd::Variable *x);
};
} // namespace gradientcore::nn
Training vs Eval Mode
Modules carry a _training flag that layers like BatchNorm and Dropout use to behave differently at training vs inference time.
train(bool mode = true)
model.train(); // Switch to training mode (default)
model.train(false); // Switch to eval mode
Recursively sets _training on all registered sub-modules. You rarely call this directly — Trainer::fit calls model->train(true) at the start of training and model->eval() at the end.
eval()
model.eval();
// equivalent to model.train(false)
Switches the module (and all sub-modules) to evaluation mode. In eval mode:
BatchNormuses its stored running statistics instead of computing batch statistics.Dropoutbecomes a pass-through — no neurons are dropped.
eval() before inferenceForgetting this is one of the most common bugs in deep learning code. A model left in training mode will give different (and wrong) results every time it runs due to Dropout randomness and BatchNorm's continued updating of running stats.
is_training()
if (layer->is_training()) {
// apply dropout, use batch stats, etc.
}
Registration
Layers register their learnable parameters and child modules in their constructors. This is what makes parameters() and save() work automatically — the module hierarchy is a tree, and traversal collects everything.
register_parameter(autograd::Variable *param)
// Inside a custom layer's constructor:
weight = autograd::create_leaf(perm_arena, w_tensor, /*requires_grad=*/true);
register_parameter(weight);
Adds param to _parameters. Only call this for learnable variables (requires_grad = true). The parameter will be included in parameters(), counted by num_parameters(), and saved/loaded by save()/load().
register_module(Module *module)
// Inside Sequential::add():
register_module(module);
Adds a child Module to _modules. Parameters of child modules are collected recursively by parameters().
register_forward_hook(ForwardHook hook)
using ForwardHook = std::function<void(autograd::Variable *)>;
layer->register_forward_hook([](autograd::Variable *out) {
std::cout << "Output shape: " << out->data->shape[0]
<< "x" << out->data->shape[1] << "\n";
});
Hooks are called with the output Variable after every forward() call. Useful for debugging — logging activation statistics, detecting NaNs, etc. — without modifying layer code. Multiple hooks can be registered; they fire in registration order.
Parameter Access
parameters()
auto params = model.parameters();
// Returns std::vector<autograd::Variable *>
Returns a flat list of all learnable parameters in the module and all its children, in depth-first traversal order. This is what the optimizer receives.
Results are cached after the first call. The cache is invalidated when register_parameter or register_module is called.
named_parameters()
auto named = model.named_parameters();
for (auto& [name, param] : named) {
std::cout << name << ": " << param->data->size << " elements\n";
}
Returns the same parameters as a std::map<std::string, Variable*> with auto-generated dot-notation names (e.g. "0.0", "0.1", "1.0"). Useful for debugging and selective freezing.
num_parameters()
std::cout << "Total params: " << model.num_parameters() << "\n";
// e.g. "Total params: 101770" (MNIST MLP)
Sum of element counts across all parameters. Used for sanity-checking your architecture.
num_trainable_parameters()
std::cout << "Trainable: " << model.num_trainable_parameters() << "\n";
Same as num_parameters() but only counts parameters where requires_grad == true. Useful if you have frozen layers.
Persistence
save(path, format)
bool ok = module.save("model.bin", "binary");
bool ok = module.save("model.json", "json");
bool ok = module.save("model.csv", "csv");
Saves all parameters to disk. Three formats are supported:
| Format | Notes |
|---|---|
"binary" | Compact, fast, not human-readable. Recommended. |
"json" | Base64-encodes float data. Loading is simplified — use binary for production. |
"csv" | One row per element: param_index, element_index, value. Inspectable but large. |
The binary format writes: a uint32_t parameter count, then for each parameter a uint64_t size followed by the raw float bytes. Simple and reliable.
Returns true on success, false on file-open or structure errors.
load(path, arena)
bool ok = module.load("model.bin", perm_arena);
Loads parameters from disk back into the module's existing parameter tensors. The module must already be constructed with the same architecture — load does not create layers, it fills them.
Checks:
- Parameter count must match exactly.
- Each parameter's element count must match exactly.
If either check fails, load returns false and prints an error. Always check the return value.
model.load() not module.load() directlyThe nn::Model class wraps load with a nicer interface. See Model.
forward() — The One Method You Must Implement
virtual autograd::Variable *forward(Arena *compute_arena,
autograd::Variable *x) = 0;
Takes an input Variable and returns an output Variable. All intermediate tensors must be allocated on compute_arena (the graph arena), which will be rewound after each batch.
The compute_arena pointer is passed explicitly so each layer knows exactly where to allocate its outputs — there are no hidden global allocators.
Calling convention
autograd::Variable *out = layer->forward(graph_arena, x);
// or equivalently, using operator():
autograd::Variable *out = (*layer)(graph_arena, x);
operator() calls forward and then fires any registered hooks on the output.
Implementing a custom layer
class MyScaleLayer : public nn::Module {
float scale_factor;
public:
MyScaleLayer(float s) : scale_factor(s) {}
autograd::Variable *forward(Arena *compute_arena,
autograd::Variable *x) override {
// Delegate to an autograd op — this builds the graph automatically
return autograd::scale(compute_arena, x, scale_factor);
}
};
summary()
module.summary();
Prints a brief description to stdout:
Module Summary:
Total Parameters: 101770
Trainable Parameters: 101770
Training Mode: true
Concrete subclasses (like Linear and Sequential) override this with more specific output.