Three-dimensional incoherently illuminated scenes can be captured
electronically by measuring the intensity distribution in a volume behind the lens.
The 3-D image is reconstructed by an inversion algorithm employing the measured,
space-variant, 3-D point spread function of the lens. A computer simulation using a
measured PSF illustrates the effectiveness of the technique.
A method for gauging the distance from a video camera to an object of interest is described. By using a
calibrated camera-lens system, range was related to focus of a selected object. Optimum focus of the image was
determined by maximizing the high-frequency content of the Fourier transform of the object image. The Walsh-
Hadamard transform was investigated as an alternative focusing function. Software was developed to determine
optimum image focus and control a motorized camera lens. Range values from the video camera to target objects
were calculated by the system. Calculated values were compared with measured distances. For any given distance,
the difference between calculated and actual distance averaged less than 1.2%. Distance values calculated using
the Walsh-Hadamard transform differed from values calculated with the Fourier transform by less than 1%.
The problem of estimating the position of and tracking an object
undergoing 3-D translational and rotational motion using passive and active
sensors is considered. The passive sensor used in this study is a stereo
camera, whereas the active is a range radar. Three different estimation
approaches are considered. The first involves estimation of the object
position by direct registration of stereo images. In the second approach, the
Extended Kalman Filter is used for estimation with measurements the stereo
images. In the third approach, an integral filter based on stereo images and
range radar measurements is used for tracking. The three different approaches
are compared via simulation in the tracking of an object undergoing a 3-D
motion with random translational and angular accelaration.
Reconsiructing 3D surfaces using multiple intensity-images is an important problem in computer vision. Most
approaches for this problem require either finding the 2D features correspondence or estimating the optical flow first.
The probabilistic model-based approach shown in this paper utilizes the intensity data directly (i.e., no feature extraction)
for reconstructing 3D surfaces without first solving the correspondence problem or estimating optical flow explicitly.
We model 3D objects as surface patches where each patch is described as a function known up to the values of
a few parameters. Surface reconstruction is then treated as the problem of parameter estimation based on two or
more images taken by a moving camera. By constructing the likelihood function for the surface parameters and
modeling prior knowledge about 3D surfaces with a Markov random field, we are able to compute the maximum
posterior probability 3D surface reconstruction based on the observed images. This paper presents some experimental
results based on a sequence of intensity images taken by a moving camera. Our approach has the advantages of: (i)
directly estimating shape for surface patches, thus making object recognition simpler; (ii) formally incorporating prior
knowledge about 3D surfaces; (iii) being highly parallel in required computation, hence promising for real-time
operation; (iv) producing optimal accuracy in a probabilistic sense; (v) being algorithmically simple; (vi) being robust
with real data.
We present a method for recovering structure and motion from a sequence of images which does not require
computation of the optical flow. We build on the previously developed "Direct Motion Vision" approach for
estimation of structure and motion from two frames. In this formulation structure and motion are obtained
directly from the gradient of image brightness which is computed orders of magnitude faster than optical flow.
The direct methodology is applied to estimate motion and normal of a planar surface relative to a camera.
Using a dynamical model of the camera motion we then show how measurements from an arbitarily long sequence
of images can be integrated with the help of an observer/filter to improve the estimate over time. Experimental
results on real images are presented.
Much research effort has been expended on attempting to calculate a description of the structure of an
arbitrary environment from the information present in the image motion obtained by an observer in this environment.
Most techniques use the image velocity from across somewhat large regions of the image to compute
the parameters of 3-dimensional motion and the shape parameters of the viewed objects. The calculations
generally require many image velocity measurements as input, a requirement that is often impractical. Thus, it
is interesting to examine what can be achieved even when only a few such measurements are used.
In this paper, we first show that given image velocity measurements at two positions in the image, it
is possible to cancel out the rotation component of motion. Then, we use this to show that measurements
from three positions, when combined with advance knowledge of just the direction of the motion's translation
component, yield information about the shape of the environment. This result holds for arbitrary instantaneous
motion between an observer and surfaces in the environment, and also permits qualitative judgements that could
be used as end results in themselves if richer descriptions of shape were unnecessary.
This paper presents a new method for estimating motion parameters of a set of 3-D points without correspondences
between points at two time instants. All point coordinates are given in projections. Three scaled orthographic
projections taken before and after the motion are used to determine the scaled translation vector and to recover the
3-D scatter matrix. Once the 3-D scatter matrix is determined at two time instants, we can find the rotation matrix
using eigenvector decomposition method. The proposed method requires neither correspondences in projections nor
correspondences between two time instants.
In this paper, we show how the translational motion of a stereo vision system relative
to, and its distance from, the scene can be recovered in closed form directly from the measurements
of image gradients and time derivatives. There is no need to estimate image motion or establish
correspondences between features across images. The direction of translational motion is recovered
using a procedure which involves minimizing the sum squared error of a linear constraint equation
over the image. The solution is given in terms of the eigenvector corresponding to the smallest
eigenvalue of a 3 x 3 positive semi-definite matrix. Using the average disparity, which maximizes the
crosscorrelation between the left and right images, we estimate the scale-factor necessary to compute
the magnitude of the translational motion, and consequently the distance to the scene.
As one moves through a static environment, the visual world as projected on the retina seems to flow
past. This apparent motion, called optical flow, can be an important source of depth perception for
autonomous robots. An important application is in planetary exploration -the landing vehicle must find
a safe landing site in rugged terrain, and an autonomous rover must be able to navigate safely through
this terrain. In this paper, we describe a solution to this problem. Image edge points are tracked between
frames of a motion sequence, and the range to the points is calculated from the displacement of the edge
points and the known motion of the camera. Kalman filtering is used to incrementally improve the range
estimates to those points, and provide an estimate of the uncertainty in each range. Errors in camera
motion and image point measurement can also be modelled with Kalman filtering. A surface is then
interpolated to these points, providing a complete map from which hazards such as steeply sloping areas
can be detected. Using the method of extended Kalman filtering, our approach allows arbitrary camera
motion. Preliminary results of an implementation are presented, and show that the resulting range
accuracy is on the order of 1-2% of the range.
Classical shape-from-shading and photometric stereo theones assume that diffuse reflection from real-world surfaces is Lambertian. However, there is considerable evidence that diffuse reflection from a large dass of surfaces is nonLambertian' . Using a Lambertian model to reconstruct such surfaces can cause serious errors in the reconstruction. In this paper, we propose a theory of non-Lambertian shading and photometric stereo. First, we explore the physics of scattering and obtain a realistic model for the reflectance map of non-Lambertian surfaces. The reflectance map is significantly non-linear. We then explore the number of light sources and the conditions on their placement for a globally unique inversion of the photometric stereo equation for this reflectance map. We theoretically establish the minimum number oflight sources needed to achieve this. These results are then extended in several directions. The main part of the extension is the joint estimation of surface normal along with the surface albedo. In the literature, this problem has been addressed only for Lambertian surfaces. We establish some basic results on the problem ofjoint estimation using the manifold structure of intensities obtained from photometric stereo. We will show that the joint estimation problem is ill-posed and propose a regularization scheme for it. Our experiments show that using the techniques proposed here, the fidelity of reconstruction can be increased by an order of magnitude over existing techniques.
This paper describes a method for estimating the surface spectral reflectance function of inhomogeneous
objects. The standard reflectance model for inhomogeneous materials suggests that surface
reflectance functions can be described as the sum of a constant (specular) function and a subsurface
( diffuse) function. First we present an algorithm to generate an illuminant estimate without using a
reference white standard. Next we show that several physical constraints on the reflectance functions can
be used to estimate the subsurface component. A band of the estimated spectral reflectance functions
is recovered as possible solutions for the subsurface component.
Three pulsed time-of--flight laser rangefinders have been developed for studying
the measurement of the 3-D shape of large objects. A manually scanned system is
suggested for manufacturing accuracy measurement and control for ship block assembly.
This system can be used to measure distance, plane regularity, angles,
spatial forms etc . , within a range from 3 m to 30 m with mm-level accuracy . The two
others are automatically scanning based on galvanometer driven mirrors, and a
servo-controlled mechanical scanner. These systems are intended for applications
where it is important to be able to gather 3-D data automatically and with high
speed. The resolutions are also on the mm-level, but the measurement speed is
10 000 points/s at maximum.
The Holometrics 3-D vision system is an active laser ranging
sensor capable of high frarne-rate mapping of the 3-D metric
properties of a scene contained within the sensor field-of--view
(FOV). The high resolution ranging function is accomplished with
well known laser ranging techniques. The range data acquired by
the system is manipulated by an image processor utilizing
proprietary software. The algorithmic operations extract 3-D
shape data on scene objects and are insensitive to object
rotation, orientation or partial obscuration. The sensor
acquisition of the 3-D data is independent of ambient lighting
conditions or visual contrast between object and background. The
elapsed time from scene scan to processor output is less than
three seconds. The system output data is used to support a wide
range of automated manufacturing, inspection, robotic,
navigation, bin-picking, assembly, ordnance guidance, etc.
applications. A block diagram of the Holometrics 3-D Vision
System is provided in Figure 1.
An analytic extension of the Hough Transform is introduced and analyzed, and an implementation is
demonstrated. The Hough Transform in its usual implementation has proven to be a useful tool for image
segmentation and feature extraction through identification of approximately coffinear point sets in images.
The Analytic Hough Transform (AliT) algorithm significantly improves upon these results by operating
specifically with the information in spatially quantized images to yield those pixel sets that exactly define
digital lines in the image. The resulting pixel sets, while being subsets of a digital line set, need not be
contiguous. Thus the AHT also represents an alternative to digital line tests that depend upon contiguity.
An Inverse Analytic Hough Transform (IAHT) is also introduced. For a given quantized image the
AliT segments its Hough parameter space into convex polygons that represent all real line sets that pass
entirely through certain digital line pixel sets in the image. The IAHT converts these parameter space
polygons into a pair of convex hulls in image space. A real line passes between these hulls if and only if it
passes through every pixel connected with the parameter space polygon. Thus the IAHT generates a pair
of simple geometric boundaries in image space that associate pixels with polygonal AliT solution regions.
An implementation of the AliT is discussed and demonstrated. It is found that the AliT, with its exact
results, can be a computationally attractive alternative to the usual implementation of a high resolution
Hough Transform. Furthermore, the AliT and the IAHT effectively couple and efficiently find exact
solutions to the problems of digital line detection and determination of associated real line parameters.
In order to recognize an arbitrary 3D object, it is often required to extract feature points
and feature lines from its surface model. The feature points and feature lines include peaks,
pits, ridge lines, and valley lines. In this paper, we present an efficient technique for finding the
features from the triangular surface model of an arbitrary 3D object. Given a set of surface
data points, we find, using the local adjustment technique, the triangular patches that best fit
the surface of the object. For the resulting triangle-based surface model, unit normal vectors
and side lengths of the triangular patches are used systematically to locate the feature points
and lines of the surface. We present experimental results on simple objects with feature points
and feature lines.
This paper presents the development and implementation of a method for reconstruction of threedimensional
object information from silhouettes. Previous work has demonstrated the possibility of such
reconstruction based on the differential equations relating surface terminator curves and their projections,
but has not addressed important aspects of the implementation given spatially quantized images and a
finite number of silhouettes.
The method presented here is exact in that it makes appropriate use of angularly and spatially quantized
silhouette information to form convex bounds for non-convex objects. For a given set of quantized
silhouettes inner and outer convex hulls are obtained by means of an efficient algorithm. The true object
convex hull must lie between these two hulls which represent the tightest hulls that can be constructed
with the given information.
Results of reconstruction by the algorithm are shown, using actual camera-acquired silhouette data. A
detailed analysis of the sources of error is presented, demonstrating the effects of spatial quantization of
the original silhouettes and of the angular separation of successive silhouettes. It is shown that for a given
spatial resolution and local object curvature, an optimum angular separation between pairs of silhouette
views exists, and that reconstruction error increases with either a larger or small angular separation. The
convex hull boundary construction used in this work is shown to always use the best pair of silhouette
points for each hull vertex.
Rocks can be effectively used as landmarks for robot navigation through rocky terrains. For the robot
to do this, it has to be able to automatically build models of rocks. For the rocks world, models containing
qualitatively described surfaces are used. A rock is modeled as a graph. The surface of the rock is
decomposed into surface patches separated by crude edges. Each surface patch is represented by a node
in the graph. The arcs represent the adjacency relationships between the surface patches. To build such a
model, the following approach is taken.
For a scene composing of a single approximately convex object, easily distinguishable from its background,
a silhouette of the object is obtained. The silhouette is partitioned into crude segments according
to the general shape of the segments. Each segment is typed as either concave, convex or straight. The
classification is done by measuring the mean and standard deviation of distances of points of the segment
from the straight line joining its ends.
The qualitative model for the rock is built by initially assuming that the silhouette is a cross-sectional
view of the rock. A simple cyclic graph composing of nodes with surface types consistent with the segment
types is built. Thus a five segment silhoutte composing of three convex, a straight and a concave segment
results in a graph with 5 nodes, three of which are convex surfaces, one flat (corresponding to the straight
silhouette segment) and the other concave. The model is improved by moving the camera to a different
position and obtaining another silhouette. From the positions of the camera and the segment types, either
new nodes are created or the surface types of the currently existing nodes are modified. A method for
automatically building such models is discussed.
Surface reconstruction from cross-sectional contours has become increasingly important in medical image applications.
In this paper, a method of reconstructing the surface of an object based on the descriptions of cross-sectional
contours has been developed. Each cross-sectional contour is first partitioned into convex/concave segments based on
the relative locations of points on the contour. Each segment of the contour is then described by a parametric cubic
polynomial, and the boundary is recovered based on the description. A matching technique incorporating possible
deformations between adjacent cross-sections is used to obtain the correspondence between adjacent cross-sectional
contours. Once the correspondence is established, the surface between correspondent segments of adjacent crosssectional
contours is reconstructed by a traditional triangulation technique. As the reconstruction is based on the
shape of the cross-sections, the reconstructed surfaces are more close to that of the original object.
This paper reviews a physically-based approach to the reconstruction of 3D visual data over space, time,
and scale using deformable models. It summarizes three applications that exemplify the approach-visual
surface reconstruction, stereo correspondence matching, and the recovery of 3D shape and nonrigid motion.
In this paper we discuss the ongoing research on the problem of shape description, and decomposition of complex
objects in range images. We propose a paradigm for part description and segmentation by integration of contour,
surface, and volumetric primitives. Unlike previous approaches, we use geometric properties derived from
both bpundary-based (surface contours and occluding contours), and primitive-based (biquadratc patches and
superquadric models) representations to define and recover part-whole relationships, without a priori knowledge
about the objects or the object domain. The descriptions thus obtained are independent of position, orientation,
scale, domain and domain properties, and are based purely on geometric considerations. We pose the problem
of integration in terms of evaluation of the intermediate descriptions and segmentation of the objects in a closed
loop process. We present algorithms for superquadric edge detection and apparent contour generation. The
criteria for the evaluation of the superquadric models is discussed and examples of real objects supporting our
approach are presented.
I describe a system that fits deformable models to range data. Models are represented
by using modal dynamics applied to volumetric primitives, which significantly improves
the computational complexity of both model recovery and subsequent processing. Given a
segmentation of the range data into parts (see reference ), a volumetric description is
obtained by a fitting procedure that minimizes squared error between the range measurements
and the model's visible surface. For simple part shapes it is possible to compute the
deformable model's parameters using only the shape of its symmetry axes.
Luminance changes in a TV image sequence can be interpreted as being due to a scene consisting
of 3D-objects moving with 6 degrees of freedom in 3D-space. The 3D-space is illuminated by a
light source and is observed by a camera. A parametric model is presented which employs an
explicit representation of the illumination source, the camera, and the 3D-objects. A method is
suggested, in which an Analysis-by-Synthesis-process automatically extracts the model parameters
from an incoming sequence of monocular TV images and thereby allows the modelling of the
3D scene. Since a parametric description of a complex scene resulting in a physically satisfactory
modelling cannot be achieved in all generality, we have concentrated on the modelling of quasi-
rigid, natural objects with homogeneous surfaces like the head-shoulder parts of human beings.
For object—oriented analysis—synthesis coding an image analysis algorithm is required which automatically generates the parameter sets which describe moving 3D objects in an image sequence. Areas changed between two consecutive images are detected by means of change detecti()n. Special processing is carried out to compute areas which coincide with object boundaries and to eliminate areas which represent illumination changes. These areas are segmented into silhouettes of moving objects and uncovered background. The border of an ob ject silhouette is interpreted as the outermost contour of an object. These contours in combination with a simple function giving the z—distance between them provide a first estimiate for 3D shape of a model object. In order to improve the efficiency of motion analysis a concept for combining model objects with new parts of moving objects is proposed. Results of an automatic image analysis based on moving 3D objects are shown using video telephone test sequences.
This contribution presents a method to segmentate video scenes hierachically into different moving objects and
subobjects using a 2 -dimensiona1 description of these scenes. Therefore information from single images as well as
information from successive images is used to spit up a scene into different objects. Furthermore each of these objects
is characterized by a transform h(x, T) which is implicitely describing the surface and the three-dimensional motion
of the moving objects in the scene. Using this description an object oriented prediction of the image contents from
one image to the next as it may be used in low bitrate image coding is possible.
We present a three-dimensional head motion detection system called a realtime headreader. This headreader
analyzes the head motion picture sequences taken by a TV-camera, and extracts the motion parameters in realtime,
i.e. 3-d rotations and translations. We used a simple but very fast algorithm, which exploits the contrast
of hair and face to recognize face orientation. The system extracts the head and face area, then estimates the
head motion parameters from the change in position of each area's centroids. The head motion is computed at
nearly 10 frames per second on a SUN4 workstation and the motion parameters are sent to an IRIS workstation
at a 2.5 Kbps. The IRIS generates a head motion sequence that duplicates the original head motion. The entire
motion detection program is written in C language. No special image processing hardware is used, except for a
Our head motion detection system will enhance man-machine interactions by providing a new visual eue.
An operator will be able to point to a target by just looking at it thus a mouse or 3-d tracking device is not
needed. The eventual goal of this research is to build an intelligent video communication system that codes the
information in terms of high level language rather than compressed video signals.