Tensors in the Autograd Engine

The tensor module provides the data. The autograd engine provides the memory of what you did with that data — so it can run the chain rule backwards. Here's how the two connect.

`autograd::Variable`

Every tensor that participates in a differentiable computation is wrapped in a Variable:

struct Variable {
    Tensor  *data;           // The actual tensor values
    Tensor  *grad;           // Gradient accumulator (same shape as data)
    bool     requires_grad;  // Should we compute gradients for this?
    bool     is_leaf;        // Is this a parameter (true) or intermediate (false)?

    Edge    *parents;        // Inputs to the op that created this Variable
    uint32_t num_parents;

    Tensor **saved_tensors;  // Tensors saved for the backward pass
    uint32_t num_saved;

    uint32_t     reduction;      // For loss ops
    float        metadata_float; // alpha, delta, scale, etc.

    void (*backward_fn)(Variable *self, Arena *arena);
};

Variable structs live on the graph arena — they're freed en masse when you call graph_arena->pop_to(pos) after each batch.

create_leaf — wrapping a tensor

Variable *x = autograd::create_leaf(graph_arena, t_x, false);

create_leaf is how you turn a raw Tensor into something the autograd graph can track:

Variable *create_leaf(Arena *arena, Tensor *data, bool requires_grad) {
    Variable *v     = arena->push<Variable>();
    v->data         = data;
    v->requires_grad = requires_grad;
    v->is_leaf      = true;
    v->backward_fn  = nullptr;  // Leaves don't have a backward fn

    if (requires_grad) {
        v->grad = tensor_create_zeros(arena, data->ndims, data->shape);
    } else {
        v->grad = nullptr;
    }
    return v;
}

requires_grad = true: This is a parameter — the optimizer will update it, and backward will accumulate into v->grad.
requires_grad = false: This is data (input batch, targets) — no gradients needed, no grad tensor allocated.

How Ops Build the Graph

Every differentiable operation (e.g. autograd::relu) does three things:

Compute the output using the corresponding tensor_* function.
Allocate a new Variable for the result on the graph arena.
Wire up the backward function.

Variable *relu(Arena *arena, Variable *in) {
    // 1. Forward computation
    Tensor *out_data = tensor_create_zeros(arena, in->data->ndims, in->data->shape);
    tensor_relu(out_data, in->data);

    // 2. Allocate output Variable
    Variable *out = arena->push<Variable>();
    out->data         = out_data;
    out->requires_grad = in->requires_grad;
    out->is_leaf      = false;

    if (out->requires_grad) {
        // Allocate grad tensor
        out->grad = tensor_create_zeros(arena, out_data->ndims, out_data->shape);

        // Wire parents (for graph traversal)
        out->num_parents = 1;
        out->parents     = arena->push_array<Edge>(1);
        out->parents[0]  = {in};

        // Save tensors needed by backward
        out->num_saved       = 1;
        out->saved_tensors   = arena->push_array<Tensor *>(1);
        out->saved_tensors[0] = in->data;   // Need input to compute gradient mask

        // 3. Backward closure
        out->backward_fn = [](Variable *self, Arena *temp_arena) {
            Variable *parent = self->parents[0].node;
            if (!parent->requires_grad) return;

            Tensor *local_grad = tensor_create_zeros(
                temp_arena, parent->grad->ndims, parent->grad->shape);

            tensor_relu_grad(local_grad, self->saved_tensors[0], self->grad);
            tensor_add(parent->grad, parent->grad, local_grad);
        };
    }
    return out;
}

The result is a DAG (directed acyclic graph) of Variable nodes:

input ─► relu ─► linear ─► cross_entropy ─► loss
  │        │        │             │
  └parent  └parent  └parent      └parent
            │        │
           [saved:  [saved:
            input]   input, weight]

Backward Pass

autograd::backward(arena, loss_node) reverses the graph:

void backward(Arena *arena, Variable *loss_node) {
    // Start gradient: d(loss)/d(loss) = 1
    tensor_fill(loss_node->grad, 1.0f);

    // Topological sort of the computation graph
    std::vector<Variable *> topo;
    build_topo(loss_node, visited, topo);

    // Reverse traversal: apply each backward_fn
    for (auto it = topo.rbegin(); it != topo.rend(); ++it) {
        if ((*it)->backward_fn)
            (*it)->backward_fn(*it, arena);
    }
}

Each backward_fn:

Computes the local gradient contribution using tensor_*_grad.
Adds it to the parent's grad tensor (tensor_add(parent->grad, parent->grad, local_grad)).

Gradients accumulate via tensor_add — this is the correct behaviour for parameters that appear in multiple places in the graph.

Memory Layout During a Batch

graph_arena (before batch):
┌─────────────────────────────┐  ← saved_pos
│                             │
│                             │

graph_arena (during forward):
┌──────────────┬──────────────┬──────────────┬──────────────┐
│  batch data  │  activations │   Variables  │ grad tensors │
│  (x, y)      │  (post-relu) │   (DAG)      │              │
└──────────────┴──────────────┴──────────────┴──────────────┘

After pop_to(saved_pos): entirely reclaimed → back to empty

Everything allocated after saved_pos — intermediate tensors, grad tensors, the graph nodes themselves — vanishes in a single pointer reset. The only things that survive are on perm_arena: the model parameters and their gradient accumulators.

saved_tensors vs parents

These serve different purposes:

parents: Links for graph traversal (the topology). Points to Variable nodes.
saved_tensors: Data needed by the backward function. Points to Tensor data, because the input data is what you need to compute the local gradient — not the whole Variable.

For relu, the backward needs the pre-activation values (to know which elements were positive). For cross_entropy_loss, the backward needs both the logits and the targets. Only what's strictly necessary is saved.

autograd::Variable​

create_leaf — wrapping a tensor​

How Ops Build the Graph​

Backward Pass​

Memory Layout During a Batch​

saved_tensors vs parents​