We propose a method of marker-less AR which uses line segment feature. Estimating camera poses is important part of AR system. In most of conventional marker-less AR system, feature point matching between a model and its image is required for the camera pose estimation. However, a sufficient number of correspondence points is not always detected to estimate accurate camera poses. To solve this problem, we propose the use of line segments feature that can possibly be detected even when only a few feature points can be detected. In this paper, we propose a marker-less AR system that uses line segment features for camera pose estimation. In this system, we propose a novel descriptor of the line segment feature for achieving fast computation of the camera pose estimation. We also construct a database which contains a k-d tree of line feature and 3D line segment position for finding 2D-3D line segment correspondences from input image and the database, so that we can estimate the camera pose and perform AR. We demonstrated that our proposed method can estimate the camera pose and provide robust marker-less AR in the situation where point matching method fails.
An efficient system that upsamples depth map captured by Microsoft Kinect while jointly reducing the effect of noise is
presented. The upsampling is carried by detecting and exploiting the piecewise locally planar structures of the downsampled
depth map, based on corresponding high-resolution RGB image. The amount of noise is reduced by accumulating
the downsampled data simultaneously. By benefiting from massively parallel computing capability of modern commodity
GPUs, the system is able to maintain high frame rate. Our system is observed to produce the upsampled depth map that is
very close to the original depth map both visually and mathematically.
Recently, augmented reality has become very popular and has appeared in our daily life with gaming, guiding
systems or mobile phone applications. However, inserting object in such a way their appearance seems natural is still an
issue, especially in an unknown environment. This paper presents a framework that demonstrates the capabilities of
Kinect for convincing augmented reality in an unknown environment. Rather than pre-computing a reconstruction of the
scene like proposed by most of the previous method, we propose a dynamic capture of the scene that allows adapting to
live changes of the environment. Our approach, based on the update of an environment map, can also detect the position
of the light sources. Combining information from the environment map, the light sources and the camera tracking, we
can display virtual objects using stereoscopic devices with global illumination effects such as diffuse and mirror
reflections, refractions and shadows in real time.
This paper presents generating a multi-layered see-through movie for an auto-stereoscopic display. This work is
based on Diminished Reality (DR), which is one of the research fields of Augmented Reality (AR). In the usual
AR, some virtual objects are added on the real world. On the other hand, DR removes some real objects from
the real world. Therefore, the background is visualized instead of the real objects (obstacles) to be removed.
We use multiple color cameras and one TOF depth camera. The areas of obstacles are defined by using the
depth camera based on the distance of obstacles. The background behind the obstacles is recovered by planarprojection
of multiple cameras. Then, the recovered background is overlaid onto the removed obstacles. For
visualizing it through the auto-stereoscopic display, the scene is divided into multiple layers such as obstacles
and background. The pixels corresponding to the obstacles are not visualized or visualized semi-transparently
at the center viewpoints. Therefore, we can see that the obstacles are diminished according to the viewpoints.
In this paper, we propose a novel method of representing the complex surface of a 3D object for a new aerial 3D display which can draw dots of light at an arbitrary position in a space.
The aerial 3D display that we use in this research can create a dot of light at 50 kHz and can draw dots of light by vector scanning.
The proposed method can generate point sequence data for the aerial 3D display from 3D surface models consisting of polygonal patches. The 3D surface model is polygonal model which are generally used in computer graphics.
The proposed method represents the surface with contours consisting of intersections of an object and cross sections by a sequence of points for vector scanning.
In this research, some polygonal models, for example face and hand, are examined at experiments. From the experiments of drawing, the polygonal models can successfully be drawn by the proposed method.
We present a novel 3D display that can show any 3D contents in free space using laser-plasma scanning in the air. The
laser-plasma technology can generate a point illumination at an arbitrary position in the free space. By scanning the
position of the illumination, we can display a set of point illuminations in the space, which realizes 3D display in the
space. This 3D display has been already presented in Emerging Technology of SIGGRAPH2006, which is the basic platform of our 3D display project. In this presentation, we would like to introduce history of the development of the laser-plasma scanning 3D display, and then describe recent development of the 3D contents analysis and processing technology for realizing an innovative media presentation in a free 3D space. The one of recent development is performed to give preferred 3D contents data to the 3D display in a very flexible manner. This means that we have a platform to develop an interactive 3D contents presentation system using the 3D display, such as an interactive art presentation using the 3D display. We would also like to present the future plan of this 3D display research project.
The calibration method for a multiple range finders system is proposed in this paper. Generally, multiple rangefinders are required to obtain the whole surface shape of a large object such as a human body. The proposed method solves range data registration by a prior calibration using a reference plane with rectangular markers. The world coordinates system is defined on the reference plane. Because range data have about two hundred thousands of 3-D points, the normal vector of the reference plane is accurately estimated by fitting the regression plane to the 3-D points. If the Z-axis of the world coordinates system for our calibration method is defined as the axis which cross meets the reference plane, it is determined by the normal vector. On the other hand the X and Y axes are defined as the horizontal line and vertical line of rectangular markers. They are determined by detecting and extracting the rectangular markers from the intensity image. Therefore, the orientation of each rangefinder is estimated based on the world coordinates system. In the experiments, the multiple rangefinders system which consists of twelve rangefinders is used. Experimental results indicate that the RMSE is 2.3 mm in the case of measuring a cylinder object.
The practical three-dimensional measurement system with a high resolution based on a space code light pattern projection and a phase-shifted light pattern projection is presented. Three-dimensional measurement device's so-called, rangefinder, are expected to be applied in the apparel, medical and various fields. The performance of a rangefinder is evaluated by the measurements of time, depth, size, etc. The system using a space code technique can stably acquire the depth of an object although the resolution of the depth is
not good because an object's space is coded in the shape of a wedge. On the other hand, in a phase shift technique, the high resolution depth of an object is able to be theoretically acquired because the object's space is divided finely by phase shifted light projection. But it is difficult to stably acquire the depth of an object because of the phase connection problem. In this paper, these problems (which are high resolution and phase connection problem) are able to be solved by both the space code technique and the phase shift technique. The effectiveness of this system is also described.
In this paper, a method which estimates the trajectory of the vehicle from a single vehicle camera is proposed.
The proposed method is a model based method which assumes that the vehicle is running on a planar road. The
input image is converted to a Top-View image and a matching (registration) between the next Top-View image
is done. The registration is done based on an assumed velocity parameter, and repeated with entire candidate
parameters. In this paper, a simple model and the particle filter is introduced to decrease the computation cost.
Simple model gives a constraint to the registration of the Top-View images, and the particle filter decreases the
number of the candidate parameters. Position of the camera is obtained by accumulating the velocity parameters.
Experiments shows 3 results. Enough decreasement of the computation cost, suitable estimated trajectory and
small enough computation cost to estimate the trajectory of the vehicle.
This paper presents an approach to capture visual appearance of a real environment such as an interior of a room. We propose the method for generating arbitrary viewpoint images by building light field with the omni-directional camera, which can capture the wide circumferences. Omni-directional camera used in this technique is a special camera with the hyperbolic mirror in the upper part of a camera, so that we can capture luminosity in the environment in the range of 360 degree of circumferences in one image. We apply the light field method, which is one technique of Image-Based-Rendering(IBR), for generating the arbitrary viewpoint images. The light field is a kind of the database that records the luminosity information in the object space. We employ the omni-directional camera for constructing the light field, so that we can collect many view direction images in the light field. Thus our method allows the user to explore the wide scene, that can acheive realistic representation of virtual enviroment. For demonstating the proposed method, we
capture image sequence in our lab's interior environment with an omni-directional camera, and succesfully generate arbitray viewpoint images for virual tour of the environment.
This paper presents a novel method for virtual view generation that allows viewers to fly through in a real soccer scene. A soccer match is captured by multiple cameras at a stadium and images of arbitrary viewpoints are synthesized by view-interpolation of two real camera images near the given viewpoint. In the proposed method, cameras do not need to be strongly calibrated, but epipolar geometry between the cameras is sufficient for the view-interpolation. Therefore, it can easily be applied to a dynamic event even in a large space, because the efforts for camera calibration can be reduced. A soccer scene is classified into several regions and virtual view images are generated based on the epipolar geometry in each region. Superimposition of the images completes virtual views for the whole soccer scene. An application for fly-through observation of a soccer match is introduced as well as the algorithm of the view-synthesis and experimental results..
In this paper, we propose a method of tracking soccer players using multiple views. As many researchers have done on soccer scene analysis by using trajectories of the playser and the soccer ball, it is desirable to track soccer players robustly. Soccer player tracking enables strategy analysis, scene recovery, making scenes for broadcasting, and automatic system of the camera control. However, soccer is a sport that occlusion occurs in many cases, and tracking often fails by the occlusion of the players. It is difficult to track the players by using a single camera alone. Therefore, we use multiple view images to avoid the occlusion problem, so that we can obtain robustness in player tracking. As a first step, inner-camera operation is performed independently in each camera to track the players. In any case that player can not be tracked in the camera, inter-camera operation is performed as a second step. Tracking information of all cameras are integrated by using the geometrical relationship between cameras called homography. Inter-camera operation makes it possible to get the location of the player who is not detected in the image, who is occluded by the other player, and who is outside the angle of view. Experimental results show that robust player tracking is available by tracking advantage using multiple cameras.
Analysis of the dermo-epidermal surface in three-dimensions is important for evaluating cosmetics. One approach is based on the active contour model, which is used for extracting local object boundaries with closed curve form. The dermo-epidermal surface, however, is a plane with open form. We have developed a method of automatically extracting the dermo-epidermal surface from volumetric confocal microscopic images, as well as constructing a 3-D visual model of the surface by using the geometric information contained in the control points. Our method is a 3-D extension of the active contour model, so we call it the active open surface model (AOSM). The initial surface for AOSM is an open curve plane, guided by a 3-D internal force, a 3-D external constraint force, and a 3-D image force, which pull it toward the objective surface. The proposed technique has been applied to extract actual dermo-epidermal surface in the given volumetric confocal microscopic images.
In this paper, we introduce a geometric registration method for augmented reality (AR) and an application system, interior simulator, in which a virtual (CG) object can be overlaid into a real world space. Interior simulator is developed as an example of an AR application of the proposed method. Using interior simulator, users can visually simulate the location of virtual furniture and articles in the living room so that they can easily design the living room interior without placing real furniture and articles, by viewing from many different locations and orientations in real-time. In our system, two base images of a real world space are captured from two different views for defining a projective coordinate of object 3D space. Then each projective view of a virtual object in the base images are registered interactively. After such coordinate determination, an image sequence of a real world space is captured by hand-held camera with tracking non-metric measured feature points for overlaying a virtual object. Virtual objects can be overlaid onto the image sequence by taking each relationship between the images. With the proposed system, 3D position tracking device, such as magnetic trackers, are not required for the overlay of virtual objects. Experimental results demonstrate that 3D virtual furniture can be overlaid into an image sequence of the scene of a living room nearly at video rate (20 frames per second).
In such a space where human workers and industrial robots work together, it has become necessary to monitor a robot motion for the safety. For such robot surveillance, we propose a robot tracking system from multiple view images. In this system, we treat tracking robot movement problem as an estimation problem of each pose parameter through all frames. This tracking algorithm consists of four stages, image generating stage, estimation stage, parameter searching stage, and prediction stage. At the first stage, robot area of real image is extracted by background subtraction. Here, Yuv color system is used because of reducing the change of lighting condition. By calibrating extrinsic and intrinsic parameters of all cameras with Tsai's method, we can project 3D model of the robot onto each camera. In the next stage, correlation of the input image and projected model image is calculated, which is defined by the area of robots in real and 3D images. At third stage, the pose parameters of the robot are estimated by maximizing the correlation. For computational efficiency, a high dimensional pose parameter space is divided into many low dimensional sub-spaces in accordance with the predicted pose parameters in the previous flame. We apply the proposed system for pose estimation of 5-axis robot manipulator. The estimated pose parameters are successfully matched with the actual pose of the robots.
Recently, there is an increasing interest in capturing 3D models of real objects. The range scanner can acquire high quality shape data of the object, but the texture image for surface rendering obtained by the scanner is not generally high resolution and high quality. High-resolution color images at different position are generally taken in addition to the range data so that more realistic images can be rendered from the captured 3D model using such high quality textures. We propose the method of modeling high quality 3D model in shape and appearance by aligning multiple view range images obtained by a range scanner and multiple view color images taken by a digital camera around the object. Color images used as textures are calibrated by Tsai's method, in which lens distortion is also calibrated. On the other hand, a surface model of the object is created by registering and integrating range data sets taken from multiple directions. For registration, we use color ICP (Iterative Closest Point) algorithm that aligns two surfaces using color images and surface shape of the object. For integration, we build voxel model from range images and then detect polygons by Marching Cubes algorithm. We apply textures of high-resolution images to the surfaces by blending, and finally reconstruct realistic 3D model concerned with shape and appearance.
In this paper, we describe the 3D shape reconstructing system from multiple view images using octree and silhouette. Our system consists of four calibrated cameras. Each camera is connected to a PC that locally extracts the silhouettes from the image captured by the camera. The four silhouette images and camera images are then sent to host computer to perform 3D reconstruction. For making the reconstruction faster, the object 3D space is represented by octree structure. If an octant does not entirely consist of the same type of voxels, then it is further subdivided until homogeneous cubes, possibly single voxels, are obtained. Allocating these cubes, and projecting them into all silhouette images, we perform the intersection of the projected cube region with silhouette region. We develop a new algorithm for fast speed constructing octree. The algorithm can reduce time complexity to check if a node should project 8 cube vertices to image plane, using a stack that keeps parents' temporary cube type. By using our algorithm, our system runs in semi real time computation, (about 5 frames per second) for generating 3D shape of the human in voxel representation.
We propose a method for three-dimensional (3D) reconstruction of environment from image sequence taken with handy video camera. The convention EPI analysis is a popular method for 3D reconstruction from motion camera image sequence, but the camera needs to be moved linearly with constant velocity. We propose a method for enabling the EPI analysis to be applicable to image sequence taken with a handy video camera of unknown motion. The proposed method removes irregular shaking by hands from input image sequence by using the factorization method and tracking feature points. We apply the convention EPI analysis to the corrected image sequence as if which were taken by camera moved horizontally with constant velocity. The method provides 3D points of the feature points in the scene. In addition, we estimate surface of environmental structure from the 3D points obtained with the method. In the surface estimation, the 3D position of the feature points can be adjusted by matching the texture on the input images. Finally, the 3D model represented by triangle meshes is reconstructed. By rendering the corresponding image region to each mesh, the proposed method made it possible to reconstruct 3D scene with more reality than traditional one.
In this paper, we propose a method for generating arbitrary view image by interpolating images between three cameras using epipolar geometry. Projective geometry has recently been used in the field of computer vision, because projective geometry can be easily determined comparing with Euclidean geometry. In the proposed method, three input camera images are rectified so that the vertical and horizontal directions can be completely aligned to the epipolar planes between the cameras. This rectification provides Projective Voxel Space (PVS), in which the three axes are aligned with the direction of camera's projection. Such alignment simplifies the procedure for projection and back projection between the 3D space and the image planes. First, we apply shape-from-silhouette with taking advantage of PVS. The consistency of color value between the images is evaluated for final determination of the object surface voxel. Therefore, consistent matching in three images is estimated and images can be interpolated from the matching information. Synthesized images are based on 3D shape in PVS, so the occlusion of the object is reproduced in the generated images, however it requires only weak calibration.
In this paper, we propose a method for arbitrary view generation from multiple view images taken with uncalibrated camera system. In PGS, that is the 3D space which is defined by epipolar geometry between the two basis cameras in the multiple cameras, we reconstruct 3D shape model from the silhouette images of the multiple cameras. For het shape reconstruction in the PGS, the multiple cameras do not have to be fully calibrated, but the fundamental matrices of every camera to the two basis cameras must be collected. By using the 3D model reconstructed in the PGS, we can obtain the point correspondence between arbitrary pair of images which can generate the image of arbitrary view between the pair of images.
We propose a novel method to synthesize high-resolution images by constructing a light field from image sequence taken with a moving video camera. Our method integrates multiple frames from video camera that partly captures the object by constructing a light field, which is quite different from general mosaic methods. In case of light field constructed straightforwardly, blur and discontinuity are introduced into generated images by depth variation of the object. In our method, light field is optimized to remove these blur and discontinuity, so that clear images can be generated. The optimized light field is adapted to the depth variation of object surface, but the exact shape of the object is not necessary. Extremely large resolution images that are impractical in the real system can be virtually generated from the light field. Results of the experiment applied to book surface demonstrate the effectiveness of the proposed method.
At first in this paper, we given an outline of activity of the Humanities Media Interface (HUMI) Project. This project was established by Keio University for the purpose of digital archiving of rare books held in Keio University Library, and of realizing a research oriented digital library. Then our way of acquiring rare book images of super high definition is introduced and image compensation method for acquiring just- front view of page using the 3-D information extracted from the shape of top line of the page area depicted in the image is proposed. Our approach of acquiring higher resolution image by joining close-up partial images of a page is also introduced. The proposing image adjustment method is extended for partial images of a page as preprocess of joining them together. In the experiment, well-adjusted and joined page images could be obtained.
Shape modeling is a very important issue for many studies, for example, object recognition for robot vision, virtual environment construction, and so on. In this paper, a new method for obtaining polyhedral model from multiview images using genetic algorithms (GAs) is proposed. In this method, a similarity between model and every input image is calculated, and then the model which has the maximum similarity is found. For finding the model of maximum similarity, genetic algorithms are used as the optimization method. In the genetic algorithm, the sharing scheme is employed for efficient detection of multiple solution, because some shape may be represented by multiple shape models. Some results of modeling experiments from real multiple images demonstrate that the proposed method can robustly generate model by using the GA.