Active vision is identified by a closed loop linking sensing with acting. Thus, an active vision system's behavior is directly determined by what it senses. To date however, the responses produced by active vision systems have tended to be relatively low-level, generally designed to facilitate improved sensing, by enhancing the duration or speed of object tracking, for example, or optimizing the focused application of more intensive image processing. This is probably adequate if the active vision system is designed as a front end to other processes or to specialized application systems, or if it is a demonstration in support of a theoretical vision model. However, this leaves unanswered the problems of i) how to select an appropriate action when many different alternatives are available, and ii) how best to modify the behavioral repertoire of the system. These problems are especially important in two situations: firstly, when an autonomous system faces a novel situation and must respond adaptively without the benefit of a priori knowledge, and secondly, when systems attempt higher levels of perception and response, and links between the absolute properties of the incoming image data and the actual objects of perception become increasingly attenuated. This paper discusses methods for linking learning with active vision so that the behavior of the system is optimized over time for the achievement of goals. We argue the necessity of system goals in learning vision systems, and discuss methods for propagating goals through all levels of loose hierarchies. In the last section we outline an architecture in which high and low level perception operate interactively and in parallel.