The process of collecting and labeling data can be resource intensive, so the amount of data available to the neural-network designer to develop a neural network is often limited. Furthermore, properly training and testing a neural network requires splitting the data into a training, a validation, and a test set, which further reduces the amount of data. Several statistical techniques have been developed for dealing with limited amounts of data. These techniques involve multiple resamplings of the data into a series of sets. The neural-network designer can also use these techniques to judge the performance of neural networks with limited data.
Statisticians originated these techniques for use with statistical estimation and classification, but they are applicable to neural networks. In statistics, they are used to estimate the model's generalization error and to choose its structure [Weiss, 1991; Efron, 1993; Hjorth, 1994; Plutowski, 1994; Shao, 1995]. This is true as well for neural networks. With neural networks, the designer can use these techniques to choose the network architecture, the number of hidden neurons, the salient inputs, training parameters, etc. They can also be used to evaluate the neural network's general performance. A-fold cross-validation, leave-one-out cross-validation, jack-knife resampling, and bootstrap resampling are the most common techniques used with neural networks to deal with limited data. When large amounts of data are available and the data are representative of the entire population, then these techniques are generally not needed.
Online access to SPIE eBooks is limited to subscribing institutions.