One of the purposes of computer vision is to reconstruct a three-dimensional description of the environment from multiple sensor images. Because no single image can show all salient features of a complex scene, an intelligently chosen set of images is needed to provide the complete description. Because the sensor data is generally incomplete and errorful, a priori knowledge about the scene and the sensors is used in the reconstruction process. This paper will describe methods for applying scene and sensor knowledge to the problem of three- dimensional reconstruction from multiple images. In particular, the geometric representation and reasoning techniques applied to 3D generic object recognition in the 3D FORM system [Walk88, Walk90] will be extended to reason about segmented 2D images. Knowledge about the sensors and objects in the scene will be represented as frames in the 3D FORM system. For each sensor, the geometric relationship between the sensor's pose, the image features and the world features is modeled, and for each object, the geometric relationships between the object and its parts are modeled. Three-dimensional reconstruction is performed by transforming each sensor image to a set of constraints on the world, and then combining the constraints from all sensors with constraints imposed by the object models to generate an interpretation that satisfies all constraints. The advantage of this method is that the resulting system is able to adjust itself to the available information without knowing in advance which constraints will be specified.