The classification of objects in images is a complex problem in computer vision for which many approaches have been tried but for which a general solution has not yet been achieved.' However, there are many practical applications where a successful visual recognition system could provide potentially useful information about its environment, for example, the detection of regions of interest and objects for a target recognition system,24 and the identification of roads or drivable regions4'5 in autonomous vehicle applications. The robust and practical solution of this problem therefore remains an important goal of computer vision research. One well-known approach to object recognition is to define three-dimensional wire frame models of the objects to be recognised and to search for matches between model features and image features.6'7 A verification process is needed in order to choose the correct match from amongst the several hypothetical matches that are likely to be found for each model. In principle, this approach can be extended to deal with recognising an arbitrary number of objects although often at the expense of considerable computational complexity.1 This hypothesis-verification paradigm is characteristic of model-based techniques of object recognition. An alternative approach may be to train a staiisücal classifier such as a neural network to recognise objects from a set of representative examples extracted from two-dimensional images. Such a system may therefore obviate some of the need for manual intervention in the design of object models. Furthermore, since the training of the network proceeds using data derived from the whole of the image and which may preserve the contextual relationships between one candidate object and another in the scene, the network might also learn the contextual significance of objects in an image.8 This may be particularly relevant when the task is to label large regions in an image, such as roads, or even the whole of the image.9'10 Consider the specific problem of labelling road regions segmented from images of outdoor scenes. This is a particularly difficult recognition problem since overall there are potentially very many objects and regions, which may be subject to occlusion and also to variable illumination conditions. A single object may give rise to many regions in the image, or a single region may include parts of a number of different objects. Indeed, individual regions may at best be produced as the result of shadows or other accidents of shading, or at worst as artifacts of the algorithm used to segment them from an image. Nevertheless, the feasibility of using a neural network to label such regions was originally demonstrated by Wright8: an MLP network was used to classify two region types (road and nol_road) from automatic segmentations of thirty two different black-and-white images of road scenes. The MLP network was trained to generate the correct label for regions segmented from a previously unseen image taking as its inputs a number of features generated from that region and neighbouring regions. Although limited, the performance of this system was shown to exceed significantly that of a K Nearest Neighbour classifier on the same data. Recent work by Mackeown et al.9 has extended this approach to a larger number of images and a more comprehensive label set. The research which is described here was undertaken with the specific aim of finding a tight lower bound on the best achievable performance of road finding using a neural network. The method has been generalised to use colour image data, which forms part of a new database of two hundred and forty 24-bit colour images produced under carefully monitored conditions. This is a significantly more extensive and better understood dataset than that used originally by Wright, and has made it possible to perform a more detailed quantitative performance evaluation of this approach. To separate the problems of segmentation (locating regions of interest) and classification (labelling the located regions), and to allow us to concentrate on the problem of classification, the idea of an "ideal segmentation" is used here.9"° This also allows us to address the issues of the accessibility and sufficiency of the information contained in region-based segmentations of single images for whole-scene classification tasks. The image database is described briefly in section 2. The "ideal segmentation" , and the approximation to this process used in practice, are explained in section 3. The labelling process is also described at this point. A number of geometrical and statistical features are calculated from the regions generated by the segmentation process, and these are described in section 4. These features are used by the network to classify (i.e. label) the regions, so the performance of the classifier is a function of their information content. The structure of the neural network and the method used to train it are presented in section 5. The results obtained from this system are discussed in section 6, in which new performance figures are given for a network trained specifically on the problem of labelling image regions as either 'Road' or 'NoLroad'. Finally, the conclusions to be drawn from this work are given in section 7.