Initialization
Good weight initialisation is not optional — initialise too small and gradients vanish, too large and they explode. The functions in nn::init give you the standard schemes that the deep learning literature has converged on over the past decade.