The classification of pixels in hyperspectral imagery is often made more challenging by the availability of only small numbers of samples within training sets. Indeed, it is often the case that the number of training samples per class is smaller, sometimes considerably smaller, than the dimensionality of the problem. Various techniques may be used to mitigate this problem, with regularized discriminant analysis being one method, and schemes which select subspaces of the original problem being another. This paper is concerned with the latter class of approaches, which effectively make the dimensionality of the problem sufficiently small that conventional statistical pattern recognition techniques may be applied. The paper compares classification results produced using three schemes that can tolerate very small training sets. The first is a conventional feature subset selection method using information from scatter matrices to choose suitable features. The second approach uses the random subspace method (RSM), an ensemble classification technique. This method builds many 'basis' classifiers, each using a different randomly selected subspace of the original problem. The classifications produced by the basis classifiers are merged through voting to generate the final output. The final method also builds an ensemble of classifiers, but uses a smaller number to span the feature space in a deterministic way. Again voting is used to merge the individual classifier outputs. In this paper the three feature selection methods are used in conjunction with a variant of the piecewise quadratic classifier. This classifier type is known to produce good results for hyperspectral pixel classification when the training sample sizes are large. The data examined in the paper is the well-known AVIRIS Indian Pines image, a largely agricultural scene containing some difficult to separate classes. Removal of absorption bands has reduced the dimensionality of the data to 200. A two-class classification problem is examined in detail to determine the characteristic performance of the classifiers. In addition, more realistic 7, 13 and 17-class problems are also studied. Results are calculated for a range of training set sizes and a range of feature subset sizes for each classifier type. Where the training set sizes are large, results produced using the selected feature set and a single classifier outperform the ensemble approaches, and tend to continue to improve as the number of features is increased. For the critical per-class sample size, of the order of the dimensionality of the problem, results produced using the selected feature set outperform the random subspace method for all but the largest subspace sizes attempted. For the smaller training samples the best performance is returned by the random subspace method, with the alternative ensemble approach producing competitive results for a smaller range of subspace sizes. The limited performance of the standard feature selection approach for very small samples is a consequence of the poor estimation of the scatter matrices. This, in turn, causes the best features to be missed from the selection. The ensemble approaches used here do not rely on these estimates, and the high degree of correlation between neighboring features in hyperspectral data allow a large number of 'reasonable' classifiers to be produced. The combination of these classifiers is capable of producing a robust output even in very small sample cases.