We analyse spaces of deep neural networks with a fixed architecture. We demonstrate that, when interpreted as a set of functions, spaces of neural networks exhibit many unfavourable properties: They are highly non-convex and not closed with respect to Lp-norms, for 0 < p < ∞ and all commonly-used activation functions. They are not closed with respect to the L∞-norm for almost all practically-used activation functions; here, the (parametric) ReLU is the only exception. Finally, we show that the function that maps a family of neural network weights to the associated functional representation of a network is not inverse stable for every practically-used activation function.
We summarize the main results of a recent theory—developed by the authors—establishing fundamental lower bounds on the connectivity and memory requirements of deep neural networks as a function of the complexity of the function class to be approximated by the network. These bounds are shown to be achievable. Specifically, all function classes that are optimally approximated by a general class of representation systems—so-called affine systems—can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, shearlets, ridgelets, α-shearlets, and more generally α-molecules. This result elucidates a remarkable universality property of deep neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks which provide close-to-optimal approximation rates at minimal connectivity. Moreover, stochastic gradient descent is found to actually learn approximations that are sparse in the representation system optimally sparsifying the function class the network is trained on.