It is well known that laser time-of-flight (TOF) and optical triangulation are the most useful optical techniques for distance measurements. The first one is more suitable for large distances, since for short range of distances high modulation frequencies of laser diodes (»200-500MHz) are needed. For these ranges, optical triangulation is simpler, as it is only necessary to read the projection of the laser point over a linear optical sensor without any laser modulation. Laser triangulation is based on the rotation of the object. This motion shifts the projected point over the linear sensor, resulting on 3D information, by means of the whole readout of the linear sensor in each angle position.
On the other hand, a hybrid method of triangulation and TOF can be implemented. In this case, a synchronized scanning of a laser beam over the object results in different arrival times of light to each pixel. The 3D information is carried by these delays. Only a single readout of the linear sensor is needed.
In this work we present the design of two different linear arrays of photodiodes in CMOS technology, the first one based on the Optical triangulation measurement and the second one based in this hybrid method (TFO). In contrast to PSD (Position Sensitive Device) and CCDs, CMOS technology can include, on the same chip, photodiodes, control and processing electronics, that in the other cases should be implemented with external microcontrollers.
By setting a refractor with a certain angle against the optical axis of the CCD camera lens, the image of a measuring point recorded on the image plane is displaced by the corresponding amounts related to the distance between the camera and the measuring point. When the refractor that keeps the angle against the optical axis is rotated physically at high speed during the exposure of the camera, the image of a measuring point draws a circular streak. Since the size of the circular streak is inversely proportional to the distance between the camera and the measuring point, the 3D position of the measuring point can be obtained by processing the streak.
When the measuring point is moving against the camera, the measuring point draws a spiral streak on an image plane since the circular shift is added to the movement of the measuring point. The size of the spiral streak also concerns to the depth of the measuring point. The pitch of the spiral streak concerns to the velocity parallel to the image plane of the camera and the variation rate of the spiral size concerns to the velocity in the direction of the optical axis of the camera.
Image changes produced by a camera motion are an important source of information on the structure of the environment. 3D-shape-recovery from an
image sequence has intensively been studied by many researchers and many methods were proposed so far. Theoretically, these methods are perfect, but
they are very sensitive to noise, so that, in many practical situations, we could not obtain satisfactory results. The difficulty comes from the fact
that, in some cases, the discrimination of small rotation around an axis perpendicular to the optical axis and small translation along the tangent
to the rotational motion is difficult. In the present paper, in order to improve the accuracy of recovery, we propose a method for recovering the
object shape based on sensor fusion technique. The method uses a video camera and a gyro sensor. The gyro sensor, mounted on the video camera,
outputs 3-axial angular velocity. It is used to compensate optical flow information from the video camera. We selected this sensor because it does
not require any setting in the environment, so that we can carry it anywhere we want. We have made an experimental system and got fairy good
results. We also report a statistical analysis of our method.
This paper presents a novel system for reconstructing the 3D structure of free-form objects using three 2D images acquired simultaneously. The approach is to first describe the 3D-from-2D problem in projective space in which structure is defined relative to some virtual plane. The projective approach allows combining the two families of parameters into one set that can be recovered linearly in a straightforward manner. The linearity of the process also ensures a unique and stable solution, which is a key factor for obtaining automation of the measurement process. Moreover, the geometric and algebraic constraints of the 3D-from-2D problem, including the combination with photometric constraints, are described by a unique family of multi-linear equations whose coefficients form a tensor, known as the "tri-linear tensor". One of the key features of the tensor is that it describes the 3D-from-2D constraints under all possible situations, i.e., it is not subject to any form of singularities. The system has been implemented as a portable non-contact 3D-measurement device, which is capable of measuring objects with accuracy levels ranging from 30-100 microns for the entire object. Being a robust and portable system that can be placed close to the measured objects, it covers large areas with single digital images. The outputs provided by the system can be surfaces, edges, holes, cross-sections and other reports as requested by users. The system can be used for numerous applications such as design (digitization of clay models), die and mold correction, parts measurement, prototype assembly, quality control and reverse engineering. The benefits over previous techniques, such as contact measurement ones, are the portability, capability of operating in uncontrolled environments such as production lines and the much higher throughput.
A multi-camera 3D modeling system to digitize a human head and body is presented in this paper. The main features of this system are as follows: 1) Fast capturing: Both of texture images and pattern images can be taken within a few seconds using multiple digital still cameras which are set around the target human. Slide projectors are also set to provide a color line patterned light on the target for pattern image capturing, 2) Realistic Shape and Texture: The whole shape and photo-realistic textures of the human head including hair can be digitized at a time on a personal computer, and 3) Hybrid Algorithm: Our modeling algorithm is based on a hybrid method where the Shape-from-Silhouette technique and the Active-Stereo technique are combined. In the first step, the rough shape of the target is estimated in a voxel space using our Extended Shape-from-Silhouette method. In the next step, the shape is refined based on the depth-map data that is calculated using a multi-camera active stereo method. This combination makes up for the shortcomings of each method. Our system has been applied to the digitizing several Japanese people using sixteen cameras for texture image capturing and twelve cameras and two projectors for pattern image capturing. Its capturing time is approximately three seconds and calculation time is about 15-20 minutes on a personal computer with the Pentium-III processor (600MHz) and 512MB memory to digitize the whole shape as well as the texture of the human head and body.
At 3DV Systems Ltd. we developed and built a true 3D video camera (Zcam), capable of producing RGB and D signals where D stands for distance or depth to each pixel.
The new RGBD camera makes it possible to do away with color based background substitution known as chroma-key as well as creating a whole gallery of new effects and applications such as multilayer foreground as well as background substitutions and manipulations.
The new multilayerd modality makes possible the production of mixed reality real time video as well as post- production manipulation of recorded video.
The new RGBD camera is scannerless and uses low power laser illumination to create the D channel. Prototypes have been in use for more than 2 years and, are capable of sub-centimeter depth resolution at any desired distance up to 10 m. on the present model. Additional potential applications as well as low cost versions are currently being explored.
This paper investigates the superquadrics-based object representation of complex scenes from range images. The issues on how the recover-and-select algorithm is incorporated to handle complex scenes containing background and multiple occluded objects are addressed respectively. For images containing backgrounds, the raw image is first coarsely segmented using the scan-line grouping technique. An area threshold is then taken to remove the backgrounds while keeping all the objects. After this pre-segmentation, the recover-and-select algorithm is applied to recover superquadric (SQ) models. For images containing multiple occluded objects, a circle-view strategy is taken to recover complete SQ models from range images in multiple views. First, a view path is planned as a circle around the objects, on which images are taken approximately every 45 degrees. Next, SQ models are recovered from each single-view range image. Finally, the SQ models from multiple views are registered and integrated. These approaches are tested on synthetic range images. Experimental results show that accurate and complete SQ models are recovered from complex scenes using our strategies. Moreover, the approach handling background problems is insensitive to the pre-segmentation error.
The factorization method is known to be robust and efficient for the recovery of shape and motion from an image sequence by applying Singular Value Decomposition to the tracking matrix. To get all-around 3-D data of an object, the all~around view of the object must be taken as pictures. This means that a long image sequence is required, and there is almost no feature point that can be tracked throughout all frames. This occurs because of occlusion. Consequently a large tracking matrix in which most elements are unknown is acquired. It is impractical to apply the conventional factorization method directly to such a tracking matrix, because most of the elements are unknown. Instead of applying the factorization method directly to the tracking matrix, the matrix is first divided into sub-matrices having overlapping portions. After unknown elements are estimated in each sub-matrix, the factorization method is applied to each sub-matrix to recover the partial 3-D data. Then the partial 3-D data is integrated into a whole according to the overlapped portions of each pair of sub-matrices. By modifying the factorization method in this split-and-merge manner, not only can the all-around 3-D data be recovered, but also the computation time is decreased dramatically.
We present a new algorithm that reconstructs 3D shapes from 2D images. Unlike previous algorithms that produce points or voxels representing the shapes, our algorithm produces their polygonal approximations. Such approximations are simpler, easier to manipulate, and more suitable for many graphics applications.
The algorithm consists of the following steps. First, we project the 2D images onto a plane, called ``test plane.'' The test plane is chosen at random from the set of all planes that pass through the volume, where the object to be reconstructed exists. Next, we compare the projected images with each other. If the test plane is tangential to the object, the images will coincide at some regions on the plane. We then represent such regions using texture-mapped polygons. By collecting these polygons together, we can obtain a polygonal approximation of the 3D object.
One important feature of our algorithm is its capability to incrementally refine the resulting approximations. This feature has turned out useful especially when the objects' shapes are very complex. We also show some examples to demonstrate that our algorithm works fine with partly noisy images.
In this paper, we propose a system to reconstruct 3D face models from monocular image sequences. Our approach is based on adapting a generic 3D face model to a set of sparse 3D data points of face features recovered from a video sequence. In our system, the structure from motion is accomplished by using a robust least square minimization approach that is based on dynamically minimizing a weighted least square energy function. A small number of face feature points are selected and tracked along the video sequence. The face poses at all the frames in the sequence are approximated from a pose estimation process with the generic 3D face model. A structure from motion algorithm based on a robust least square minimization is applied to the entire video sequence to recover the face structure. The adaptation of the generic 3D head model to the recovered 3D face structure is achieved by using a radial basis function interpolation.Experimental results of 3D face model recovery using the proposed algorithm are shown.
In order to improve speech recognition efficiency under background noise such as out of doors, we propose a new recognition technology combined with three dimensional lip movements. In this paper firstly, three-dimensional movements at four positions of the mouth were measured using principal component analysis to clarify which positions and which directions are main contributors to pronunciation. Secondly, recognition evaluation tests for 50 Japanese words were carried out under noise levels ranging from 40 to 80 dB. In the experiment, over 80% of recognition efficiency was measured at 70dB and improvement of 40% was obtained compared with ordinary speech recognition. From the experimental results, the proposed method can be modified to be used as practical speech recognition technology. Finally, research subjects were picked up such as an improvement in precision of measuring lip movement and experiments and data collection out of doors.
This paper presents the digital imaging results of a collaborative research project working toward the generation of an on-line interactive digital image database of signs from ancient cuneiform tablets. An important aim of this project is the application of forensic analysis to the cuneiform symbols to identify scribal hands.
Cuneiform tablets are amongst the earliest records of written communication, and could be considered as one of the original information technologies; an accessible, portable and robust medium for communication across distance and time. The earliest examples are up to 5,000 years old, and the writing technique remained in use for some 3,000 years. Unfortunately, only a small fraction of these tablets can be made available for display in museums and much important academic work has yet to be performed on the very large numbers of tablets to which there is necessarily restricted access.
Our paper will describe the challenges encountered in the 2D image capture of a sample set of tablets held in the British Museum, explaining the motivation for attempting 3D imaging and the results of initial experiments scanning the smaller, more densely inscribed cuneiform tablets. We will also discuss the tractability of 3D digital capture, representation and manipulation, and investigate the requirements for scaleable data compression and transmission methods. Additional information can be found on the project website: www.cuneiform.net
Image mosaicing has been collecting widespread attention because it
can automatically construct a panoramic image from multiple images.
Among previous methods, homography-based methods are the most accurate
in the geometric sense. This is because these methods use planar
projective transformation, which considers perspective effects as a
geometric transformation model between images. These methods, however,
have a problem of misregistration in the case of general scenes with
arbitrary camera motion. We propose a method that can reduce this
misregistration by using geometric constraints called
trilinearity. Trilinearity is a geometric relationship among three
images taken from different viewpoints. By using this relationship,
several techniques have already been introduced for other purposes,
such as 3D shape recovery or motion analysis. We use this relationship
for image mosaicing. The proposed method consists of the following
three steps. First, it establishes feature correspondences among three
images. We use small rectangular regions such as corners as
features. Second, it computes the trilinearity from the feature
correspondences. We use a robust method to exclude false
correspondences. Third, it generates a panoramic image mosaic by using
the trilinearity. Experiments using real images confirm the
effectiveness of our method.
Noise smoothing is very important method in early vision. Recently, many signals such as an intensity image and a range image are widely used in 3D reconstruction, but the observed data
are corrupted by many different sources of noise and often need to be preprocessed before further applications. This research proposes a novel adaptive regularized noise smoothing of dense range image using directional Laplacian operators. In general, dense range data includes heavy noise such as Gaussian noise and
impulsive noise. Although the existing regularized noise smoothing algorithm can easily smooth Gaussian noise, impulsive noise is not easy to remove from observed range data. In addition, in order to recover the problem such as artifacts on edge region in the conventional regularized noise smoothing of range data, the second smoothness constraint is applied through
minimizing the difference between the median filtered data and original data. As a result, the proposed algorithm can
effectively remove the noise of dense range data with directional edge preserving.
Proc. SPIE 4298, Dense estimation of surface reflectance parameters from registered range and color images by determining illumination conditions, 0000 (13 April 2001); https://doi.org/10.1117/12.424897
A texture image has often been used to reproduce real objects in
computer graphics (CG). However, the appearance of the object is not
reproduced appropriately when lighting conditions of real and CG
environments are not consistent. To overcome the problem, we propose a
new method for estimating non-uniform reflectance properties which
consist of diffuse reflectance, specular reflectance, and surface
roughness parameters with convex and concave surface. Such a surface
reflectance estimation requires object surface normal, light source
position, camera position, and surface color under different
illumination conditions for each surface point (pixel). Therefore, we
use a laser rangefinder which takes accurately registered range and
color images of an object.
It is difficult to estimate a specular reflectance parameter, since
the specular reflection is observed only within a fixed range of
angles among the camera, light source, and viewing position. In our
method, an algorithm is proposed to determine light source positions
with which the specular reflection component is strongly observed over
the surface. We also consider a self shadow and make calculation
stable by separating two reflection components. The Torrance-Sparrow
model, which accurately represents object reflectance properties, is
employed to estimate reflectance parameters by using color images
under multiple illumination conditions.
In our experiments, a measured object has partially different specular
reflection and surface roughness parameters. Experiments show the
usefulness of the proposed method.
This paper present a stereo matching approach based on topological and geometrical cooperation. Our approach has double-aim in first to reduce the complexity of the matching and in second to obtain structured matching in order to allow the CAD modeling of the 3D scene. The propose of our work suggests to represent each view by famous tool used in geometric modeling and CAD domain called em combinatorial maps witch is one of(B-Rep) representation. According this principle, topological information structured by combinatorial maps are associated with the views. In addition, each map contain geometrical informations about the primitives to be matched. Matching consist then to use together a topological process that uses topological structure to propose matching and geometrical process to validate these matching or adjust the maps. At the end, the obtained matchings are directly structured by combinatorial map, which represent the model of the matched scene. By closing faces in this model, we find new matches.
structure the scene at the beginning of the feature-matching process and the enables to improve the matching performance
in order to make easy the modeling of the final 3D-scene
Silhouette based reconstruction algorithm is simple and robust for 3D volume estimation of
an object. However, it has two main drawbacks:
insufficient number of viewing positions and the inability to detect concavity
regions. Starting from an initial convex hull
of the object to be modeled which is generated by a silhouette based reconstruction,
an algorithm based on photoconsistency is
The algorithm basically carves the excess volume elements using the multi-baseline
stereo information. Result of the described algorithm is demostrated on a sythesized object in an artificial
The problem to acquire 3D data of human face can be applied in face recognition, virtual reality, and many other applications. It can be solved using stereovision. This technique consistes in acquiring data in three dimensions from two cameras. The aim is to implement an algorithmic chain which makes it possible to obtain a three-dimensional space from two two-dimensional spaces: two images coming from the two cameras. Several implementations have already been considered. We propose a new simple realtime implementation, based on a multiprocessor approach (FPGA-DSP) allowing to consider an embedded processing. Then we show our method which provides depth map of face, dense and reliable, and which can be implemented on an embedded architecture. A various architecture study led us to a judicious choice allowing to obtain the desired result. The real-time data processing is implemented an embedded architecture. We obtain a dense face disparity map, precise enough for considered applications (multimedia, virtual worlds, biometrics) and using a reliable method.
A photogrammetric evaluation system used for the precise determination of 3D-coordinates from blocks of large metric images will be presented. First, the motivation for the development is shown, which is placed in the field of processing tools for photogrammetric evaluation tasks. As the use and availability of metric images of digital type rapidly increases corresponding equipment for the measuring process is needed. Systems which have been developed up to now are either very special ones, founded on high end graphics workstations with an according pricing or simple ones with restricted measuring functionality. A new conception will be shown, avoiding special high end graphics hardware but providing a complete processing chain for all elementary photogrammetric tasks ranging from preparatory steps over the formation of image blocks up to the automatic and interactive 3D-evaluation within digital stereo models. The presented system is based on PC-hardware equipped with off the shelf graphics boards and uses an object oriented design. The specific needs of a flexible measuring system and the corresponding requirements which have to be met by the system are shown. Important aspects as modularity and hardware independence and their value for the solution are shown. The design of the software will be presented and first results with a prototype realised on a powerful PC-hardware configuration will be featured
We present a rapid whole-field, 3-D imaging technique based on low coherence interferometry using photorefractive holography in semi-insulating Multiple Quantum Well (MQW) devices, which is capable of whole-field depth resolved 3-D imaging at frame rates exceeding 475 fps. Photorefractive holography provides a unique mechanism to discriminate against a diffuse light background, making it attractive for imaging through turbid media, e.g. for biomedical applications. We note that this whole-field technique can exploit sources of almost arbitrary spatial coherence, including LED's, fibre-coupled laser diode arrays, broadband c.w. lasers etc, as well as ultrafast laser pulses. The use of spatially incoherent light greatly reduces the deleterious impact of speckle.
Most current 3-D head scanners cannot capture a complete surface of the head due to limitation in view. As a postprocessing aid, we developed an automated method for approximating the top of the head surface. The top-of-head surface is usually the largest void area in a 360-degree head scan such as these obtained with a Cyberware PS head scanner. In this paper, we describe a two-step B-spline curve/surface approximation process to reconstruct the top ofhead from raw data set.
The traditional paper surface characterisation methods, for example based on air-stream leakage (Bendtsen, Parker print surf), are facing severe limitations. These traditional methods are better suited as indicators of erroneous production than for grading paper samples with respect to his print quality potential. It has been acknowledged that this research problem cannot be addressed without taking the papers three-dimensional structure into account. In this work, we will use a confocal image of the surface of the paper, obtained by imaging either a pinhole or a structured light pattern by a very high numerical aperture optical system on the surface of paper to be measured.
In order to analyse the 3D image of the paper we perform a multiresolution analysis. This means that a given signal is decomposed at a coarse approximation plus added details. Applying the successive approximations recursively makes the approximation error go to zero. Using multiresolution analysis and orthonormal wavelet bases, we can construct an algorithm using wavelets. That will allow us to characterise the surface of the paper and grading paper samples with respect to his print quality potential.
An efficient algorithm is presented for estimating interframe distances in a 2D frame sequence acquired by freehand scanning for the reconstruction of a 3D ultrasound image. Since interframe distances in a 2D frame sequence obtained with a hand-held ultrasound probe are not uniform, a 3D image directly reconstructed from such 2D data can substantially deviate from the real form of the human organs. Accordingly, to estimate the interframe distances in a 2D frame sequence, block-based lateral correlation functions are determined in each frame, plus it is also assumed that each block-based lateral correlation function is identical to the interframe correlation function of the block. Based on this assumption, the interframe distance between each image block and the corresponding block of the adjacent frame is then estimated. Finally, the interframe distance of each adjacent frame is estimated by averaging the estimated block-wise interframe distances. Experimental results showed that the proposed algorithm was effective in estimating the interframe distances of the test sequences and the 3D images reconstructed using the proposed method were nearly identical to the original ones.
Three-dimensional object reconstruction from two-dimensional images has attracted much attention due to its wide range of practical applications. Recently, various efficient 3D reconstruction methods have been developed. Most conventional methods, however, exhibit certain disadvantages such as complex computation for finding disparity and correspondence and huge number of input images for high-resolution reconstruction. The proposed method reconstructs a 3D object with significantly reduced number of input images based on the interpolation theory. By using two input images taken from different angles, an image which is seen from any angle in between them can be obtained by the proposed 3D interpolation method. For simple objects such as a cylinder or a cubic, if we know some parameters including radius of the object and camera rotation angle about input images, we don't have to find disparity to reconstruct the 3D structure. In the proposed method, we can obtain 3D images without significant amount of computation. The proposed method can be used in developing inexpensive computer tomography systems, 3D X-ray inspection systems, and virtual prototyping products.
The correspondence problem in image matching is an ill-defined one. It is difficult to match two stereo images to produce an accurate depth map without applying some sort of constraints to the matching process. Matching is made especially difficult near discontinuities and occlusions in the images.
A popular method of applying constraints to image matching is energy minimisation. However, this technique is computationally expensive and is not guaranteed to finish at an optimal solution.
This paper describes the use of a least cost path finding algorithm called the Viterbi algorithm as an alternative to energy minimisation. The Viterbi algorithm operates on individual horizontal scanlines and uses a cost function to find the optimum "path" of nodes through disparity space from one side of the image to the other. Constraints can be applied by restricting the possible movements of the path or by modifying the cost function. The Viterbi algorithm, unlike energy minimisation, is not an iterative process and is guaranteed to find the path that has the least possible cost.
The implementation of the Viterbi algorithm described in this paper uses constraints that were developed to make the image matching robust in the presence of discontinuities and occlusions. Results are shown for both synthetic and real-world stereo pairs.
Recently, the interests in the 3D image, generated from the range data and CAD, have exceedingly increased, accordingly a various 3D image database is being constructed. The efficient and fast scheme to access the desired image data is the important issue in the application area of the Internet and digital library. However, it is difficult to manage the 3D image database because of its huge size. Therefore, a proper descriptor is necessary to manage the data efficiently, including the content-based search. In this paper, the proposed shape descriptor is based on the voxelization of the 3D image. The medial axis transform, stemming from the mathematical morphology, is performed on the voxelized 3D image and the graph, which is composed of nodes and edges, is generated from skeletons. The generated graph is adequate to the novel shape descriptor due to no loss of geometric information and the similarity of the insight of the human. Therefore the proposed shape descriptor would be useful for the recognition of 3D object, compression, and content-based search.