Tutorial 2: MNIST Digit Classification

This tutorial covers multi-class image classification using the MNIST handwritten digit dataset. You will learn how to set up batched loading with DataLoader, build a classifier, train with CrossEntropyLoss, evaluate test accuracy, and run per-sample inference with a visual terminal display.

The full source lives in examples/mnist/.

Dataset - Github

What you'll build

A three-layer MLP that classifies 28×28 greyscale digit images (flattened to 784 floats) into one of 10 classes. This matches GradCore-Tensor's MNIST benchmark: 97.77 % test accuracy.

Architecture at a glance:

Input (784) → Linear(784→128) → ReLU → Linear(128→10)

Optimizer: Adam, Loss: CrossEntropyLoss, LR: 0.0005, Epochs: 40, Batch size: 64.

Prerequisites

The MNIST CSVs are large files tracked by Git LFS (~110 MB train, ~18 MB test). Pull them before building:

git lfs pull
cd examples/mnist
./run.sh    # compiles train, inference, and autoencoder binaries

Step 1 — Load and preprocess MNIST

MNIST is distributed as pixel rows in CSV format. The first column is the label (0–9); the remaining 784 columns are pixel intensities (0–255).

#include "../../include/gradient.hpp"
using namespace gradientcore;

// Load training data (skip header row)
auto csv_raw = CSVLoader::load_csv("data/mnist_train.csv", true);

std::vector<std::vector<float>> features, labels_raw, labels_onehot;
CSVLoader::parse_mnist_csv(csv_raw, features, labels_raw);

Normalize pixel values

normalize_minmax scales every value to [0, 1], which stabilizes gradient magnitudes during training:

CSVLoader::normalize_minmax(features);

One-hot encode labels

CrossEntropyLoss expects a target distribution, not a scalar class index. one_hot_encode converts each integer label into a length-10 vector with a single 1.0 at the correct class position:

CSVLoader::one_hot_encode(labels_raw, 10, labels_onehot);
// e.g. label "3" → [0,0,0,1,0,0,0,0,0,0]

Step 2 — Allocate arenas

MNIST training loads 60 000 images. The permanent arena must be large enough to hold the model parameters plus Adam's two moment tensors per parameter:

auto* perm_arena  = Arena::create(MiB(1024), MiB(64), true);
auto* graph_arena = Arena::create(MiB(512),  MiB(32), true);

The graph arena is rewound after each batch, so its size only needs to accommodate one batch at a time rather than the full dataset.

Step 3 — Define the model

nn::Model model(perm_arena, graph_arena);

auto* l1 = perm_arena->push<nn::Linear>();
new (l1) nn::Linear(perm_arena, 784, 128);
model.add_layer(l1);

auto* relu = perm_arena->push<nn::ReLU>();
new (relu) nn::ReLU();
model.add_layer(relu);

auto* l2 = perm_arena->push<nn::Linear>();
new (l2) nn::Linear(perm_arena, 128, 10);
model.add_layer(l2);

The final Linear(128→10) produces 10 raw logits — one per digit class. CrossEntropyLoss applies softmax internally, so no explicit Softmax layer is needed here.

Verifying the parameter count

std::cout << "Total parameters: "
          << model.get_model()->num_parameters() << std::endl;
// Expected: 784*128 + 128 + 128*10 + 10 = 101 770
if (model.get_model()->num_parameters() == 0) {
    std::cerr << "Error: 0 parameters — check add_layer." << std::endl;
    return 1;
}

Step 4 — Compile and train

model.compile(
    nn::OptimizerType::ADAM,
    nn::LossType::CROSS_ENTROPY,
    0.0005f,   // learning rate
    40,        // epochs
    64         // batch size
);

model.train(features, labels_onehot);

Why CrossEntropyLoss? It applies log-softmax to the logits internally and computes the negative log-likelihood against the one-hot target, penalizing confidently wrong predictions much more than uncertain ones.

Why Adam with LR 0.0005? A lower learning rate prevents the optimizer from overshooting the minimum in the later epochs, which is important when pushing past 97 % accuracy.

Step 5 — Evaluate test accuracy

After training, load the test split and compute top-1 accuracy in batches of 100:

auto test_csv = CSVLoader::load_csv("data/mnist_test.csv", true);
std::vector<std::vector<float>> test_features, test_labels_raw;
CSVLoader::parse_mnist_csv(test_csv, test_features, test_labels_raw);
CSVLoader::normalize_minmax(test_features);

model.get_model()->eval();   // disable dropout, use BatchNorm running stats

uint32_t correct = 0;
uint32_t batch_size = 100;

for (uint32_t i = 0; i < test_features.size(); i += batch_size) {
    uint32_t current_bs = std::min(batch_size, (uint32_t)test_features.size() - i);
    uint64_t start_pos  = graph_arena->get_pos();

    uint32_t shape_x[2] = {current_bs, 784};
    Tensor* t_x = tensor_create(graph_arena, 2, shape_x);

    for (uint32_t b = 0; b < current_bs; b++)
        for (uint32_t j = 0; j < 784; j++)
            t_x->storage->data[t_x->offset + b * 784 + j] =
                test_features[i + b][j];

    autograd::Variable* x   = autograd::create_leaf(graph_arena, t_x, false);
    autograd::Variable* out = model.get_model()->forward(graph_arena, x);

    for (uint32_t b = 0; b < current_bs; b++) {
        float max_v = -1e9f; int pred = 0;
        for (int c = 0; c < 10; c++) {
            float v = out->data->storage->data[out->data->offset + b * 10 + c];
            if (v > max_v) { max_v = v; pred = c; }
        }
        if (pred == (int)test_labels_raw[i + b][0]) correct++;
    }

    graph_arena->pop_to(start_pos);
}

float accuracy = 100.0f * correct / test_features.size();
std::cout << "Test Accuracy: " << correct << " / " << test_features.size()
          << " (" << accuracy << "%)" << std::endl;
// → Test Accuracy: 9777 / 10000 (97.77%)

note

model.get_model()->eval() is critical before evaluating. Without it, BatchNorm continues updating running statistics from test data, corrupting evaluation metrics.

Step 6 — Save the model

#include <filesystem>
std::filesystem::create_directories("bin");
model.save("bin/mnist_model.bin", "binary");

Step 7 — Inference with terminal visualization

The inference binary loads a saved model and lets you inspect any test sample interactively. It renders the digit using ANSI 256-color escape codes directly in your terminal.

if (!model.load("bin/mnist_model.bin")) { return 1; }
model.get_model()->eval();

int n;
std::cout << "Enter sample index (0 to " << features.size() - 1 << "): ";
std::cin >> n;

draw_mnist_digit(features[n].data());   // ANSI terminal render

uint32_t shape[2] = {1, 784};
Tensor* input = tensor_create(graph_arena, 2, shape);
std::memcpy(input->storage->data, features[n].data(), 784 * sizeof(float));

autograd::Variable* x   = autograd::create_leaf(graph_arena, input, false);
autograd::Variable* out = model.get_model()->forward(graph_arena, x);

float max_v = -1e9f; int pred = 0;
for (int i = 0; i < 10; ++i)
    if (out->data->storage->data[i] > max_v) {
        max_v = out->data->storage->data[i]; pred = i;
    }

std::cout << "Actual: " << labels_raw[n][0]
          << "  Predicted: " << pred << std::endl;

Running the example

cd examples/mnist

# Train (writes bin/mnist_model.bin)
./bin/train_mnist

# Interactive inference
./bin/inference_mnist
# → Enter sample index (0 to 9999): 42
# → [terminal digit visualization]
# → Actual: 7  Predicted: 7

Bonus: Autoencoder

examples/mnist/autoencoder.cpp trains an encoder–decoder network to compress each 784-dimensional digit into a 32-dimensional latent space and reconstruct it:

Input(784) → Linear(784→128) → ReLU
           → Linear(128→32)  → ReLU   ← bottleneck
           → Linear(32→128)  → ReLU
           → Linear(128→784) → Sigmoid

Loss is MSELoss between input and reconstruction — no labels needed. After training it prints both the original and its reconstruction side-by-side in the terminal.

./bin/autoencoder

Key concepts recap

Concept	Where it appears
`CSVLoader::parse_mnist_csv`	Parsing MNIST CSV format
`CSVLoader::normalize_minmax`	Pixel value scaling
`CSVLoader::one_hot_encode`	Converting integer labels to distributions
`nn::Model::compile` / `train`	High-level training API
`model.get_model()->eval()`	Switching to inference mode
`model.save` / `model.load`	Checkpoint persistence
`autograd::create_leaf(..., false)`	Inference without gradient tracking
`graph_arena->pop_to`	O(1) graph memory reclamation

Next steps

You now understand the core training loop and inference patterns. From here you can explore:

Module deep dives — how tensor, autograd, nn, and optim are implemented internally.
Adding new layers — see the Contributing Guide for how to add a custom activation or layer type.
Extending the autoencoder — try varying the bottleneck dimension or adding noise to inputs for denoising autoencoding.

What you'll build​

Prerequisites​

Step 1 — Load and preprocess MNIST​

Normalize pixel values​

One-hot encode labels​

Step 2 — Allocate arenas​

Step 3 — Define the model​

Verifying the parameter count​

Step 4 — Compile and train​

Step 5 — Evaluate test accuracy​

Step 6 — Save the model​

Step 7 — Inference with terminal visualization​

Running the example​

Bonus: Autoencoder​

Key concepts recap​

Next steps​