In this paper, we propose a robust camera tracking method that uses disparity images computed from known parameters
of 3D camera and multiple epipolar constraints. We assume that baselines between lenses in 3D camera and intrinsic
parameters are known. The proposed method reduces camera motion uncertainty encountered during camera tracking.
Specifically, we first obtain corresponding feature points between initial lenses using normalized correlation method. In
conjunction with matching features, we get disparity images. When the camera moves, the corresponding feature points,
obtained from each lens of 3D camera, are robustly tracked via Kanade-Lukas-Tomasi (KLT) tracking algorithm.
Secondly, relative pose parameters of each lens are calculated via Essential matrices. Essential matrices are computed
from Fundamental matrix calculated using normalized 8-point algorithm with RANSAC scheme. Then, we determine
scale factor of translation matrix by d-motion. This is required because the camera motion obtained from Essential matrix
is up to scale. Finally, we optimize camera motion using multiple epipolar constraints between lenses and d-motion
constraints computed from disparity images. The proposed method can be widely adopted in Augmented Reality (AR)
applications, 3D reconstruction using 3D camera, and fine surveillance systems which not only need depth information,
but also camera motion parameters in real-time.
This paper presents a new framework of an immersive kendo game with an intelligent cyber-fighter, which has its own internal needs, motivations, sets of multimodal sensors, a motor system, and a behavior system. Unlike conventional interface such as keyboard or joystick, the proposed system provides more natural and comfortable interface by exploiting multimodal interfaces such as 3D vision and speech recognition. In addition, the proposed 3D vision-based interface allows relatively free-movement in 3D space, when it compares with wired tracker-based interfaces. As a result, the user with real sword can experience an immersive fighting with the cyber-fighter in virtual environment. The proposed framework will have wide variety of applications in VR-based edutainment applications.
Recent deployments of digital cameras and fast PCs have led real-time interactive systems to adopt vision-based interfaces. However, 2D vision-based interfaces, due to the lack of 3D information, have inherent weakness in tracking 3D objects in real world. In this paper, we propose a vision-based 3D interface for interactive systems that allows an object tracking on the fly in 3D by exploiting multi-view images. Due to the characteristics of interactive systems, requiring real time processing, the main challenge is a simple and robust estimation of 3D information from the images captured by asynchronous digital cameras. The proposed vision-based 3D tracking consists of three steps: (i) dynamic stereo calibration (ii) simple object segmentation (iii) robust 3D movement tracking. To show its effectiveness of the proposed framework, we applied the proposed 3D interface to an interactive 3D Gumdo simulation. Due to the simplicity and robustness of the proposed framework, it can be applied to various real-time interactive applications.
In this paper, we propose a vision-based 3D interface exploiting invisible 3D boxes, arranged in the personal space (i.e. reachable space by the body without traveling), which allows robust yet simple dynamic gesture tracking and analysis, without exploiting complicated sensor-based motion tracking systems. Vision-based gesture tracking and analysis is still a challenging problem, even though we have witnessed rapid advances in computer vision over the last few decades. The proposed framework consists of three main parts, i.e. (1) object segmentation without bluescreen and 3D box initialization with depth information, (2) movement tracking by observing how the body passes through the 3D boxes in the personal space and (3) movement feature extraction based on Laban's Effort theory and movement analysis by mapping features to meaningful symbols using time-delay neural networks. Obviously, exploiting depth information using multiview images improves the performance of gesture analysis by reducing the errors introduced by simple 2D interfaces In addition, the proposed box-based 3D interface lessens the difficulties in both tracking movement in 3D space and in extracting low-level features of the movement. Furthermore, the time-delay neural networks lessens the difficulties in movement analysis by training. Due to its simplicity and robustness, the framework will provide interactive systems, such as ATR I-cubed Tangible Music System or ATR Interactive Dance system, with improved quality of the 3D interface. The proposed simple framework also can be extended to other applications requiring dynamic gesture tracking and analysis on the fly.
In this paper, we report on a convenient and unified framework for generating a photo-realistic interactive virtual environment (piVE) using heterogeneous multiview cameras, while not using bluescreen techniques and special rendering hardware. In spite of the rapid growth of computer hardware, rendering a photo-realistic virtual environment on the fly is still a challenging problem. With the proposed framework, exploiting stereo images/videos, piVE can be rendered in realtime without using expensive high-end computer with rendering hardware. The proposed framework consists of three main parts, i.e. (1) photo-realistic virtual space generation exploiting a camera with stereoscopic adapter, (2) generation of a video avatar (a special object representing the user) by exploiting multiview camera, and (3) graphics object rendering according to the given camera parameters and the user's interaction. We also address z-keying issues among background video, graphics objects and video avatar.
In this paper, we have developed the theoretical framework for coherent image segmentation using stereo images. The robust segmentation is performed by combining multiple cues such as shape, intensity (color) and depth. Through image segmentation has been an active research field over last few decades, segmentation based on individual cue has several well-known drawbacks. For example, intensity-based schemes tend to generate detailed but inaccurate edges, and motion-based schemes only help segment moving objects. In addition, depth- based schemes may not yield satisfactory segmentation results because disparity estimation itself is a well-known ill-posed problem. Therefore, the main issue in segmentation is how to combine various cues to achieve robust segmentation results. In the proposed scheme, robust and consistent segmentation is achieved by properly combining several cues using MRF/GRF model. We first estimate intensity edges of the image and then re-evaluate the edges based on disparity edge information. In turn, the resulting intensity edges can help estimate an accurate disparity field. In addition, occlusion area can be segmented by properly combining intensity edges of stereo images.
In this paper, we propose a modified overlapped block matching (OBM) scheme for stereo image coding. The OBM scheme has been introduced in video coding, as a promising way to reduce blocking artifacts by using multiple vectors for a block, while maintaining the advantages of the fixed size block matching framework. However, OBM has its own limitations, even though it overcomes some drawbacks of block matching schemes. For example, to estimate an optimal displacement vector (DV) field, OBM requires complicated iterations. In addition, OBM does not always guarantee a consistent DV field, even through several iterations, because the estimation considers only the magnitude of the prediction error as a measure. Therefore, we propose a modified OBM scheme, which allows both consistent disparity estimation and efficient disparity compensation, without several iterations. In the proposed scheme, the computational burden resulting from iterations is reduced using 'open-loop' coding, which decouples the encoding into estimation and compensation. The consistent disparity estimation is performed by using a causal MRF model and a half-pixel search, while maintaining (or reducing) the energy level of disparity compensated difference frame. The compensation efficiency is improved by interpolating the reference image in half pixel accuracy and by applying OBM in part. To prove the efficiency of the proposed OBM scheme, we provide some experimental results, which show that the proposed scheme achieves higher PSNR, about 0.5 - 1 dB, as well as better perceptual quality, at a fraction of the computation, as compared to a conventional OBM.
In this paper, we address the problem of optimal bit allocation for stereo images. Conventional rate-distortion based methods have mainly concentrated on minimizing total distortion within a given bit budget by independently encoding each image. However, stereo image coding, like video coding, requires dependent bit allocation framework to further improve encoding performance because binocular and spatial dependencies are introduced by the disparity estimation and differential pulse coded modulation of the disparity vector field. We first formulate the dependent bit allocation problem for stereo image coding and extend it to blockwise dependent bit allocation. We then focus on the blockwise dependent quantization because using open-loop disparity estimation decouples the dependent bit allocation problem into two independent problems; disparity estimation and dependent quantization. The encoding complexity and delay in the dependent quantization framework can be significantly reduced by exploiting the unidirectional binocular dependency. An optimal set of quantizers can be selected using the Viterbi algorithm. For a given three quantization scales, the proposed scheme provides higher PSNR, about 3dB compared to JPEG without disparity compensation and 0.5dB compared to optimal higher PSNR, about 3dB compared to JPEG without disparity compensation and 0.5dB compared to optimal independent blockwise quantization with disparity compensation and 0.5dB compared to optimal independent blockwise quantization with disparity compensation. The proposed scheme can help develop a fast and efficient bit allocation strategy, be a benchmark of practical rate control schemes or be used in asymmetric applications, which may involve offline encoding, such as CD-ROM, DVD, video-on-demand, etc.
The increasing demand for 3D imaging and recent developments of autostereoscopic displays will accelerate the usage of 3D systems in various areas. However, limited channel bandwidth is, as for monocular images, the main bottleneck for realizing 3D systems. As a result, an efficient compression algorithm will be essential to reduce the bandwidth requirement while maintaining the perceptual visual quality at the decoder. In this paper, we will focus on compression of stereo images. When it comes to stereo image coding, we can take advantage of binocular redundance by using disparity compensation. The most popular disparity compensation method approaches so far have ben block based methods, due mostly to their simplicity. Block based methods, however, may suffer from blocking artifacts at low bit rates due to the uniform disparity assumption within a fixed block. Meanwhile, if we reduce the block size, the disparity estimation may suffer from various noise effects which result in increases of bit rates for the disparity. Considering these observations, we estimate disparity based on a small block or a pixel with thee energy equation derived from the MRF model. In order to prevent oversmoothing across boundaries, we use the combined intensity edges of two images as an initial disparity boundary. Then, we segment the resulting smooth disparity field. Finally, the disparity and the starting position are encoded using DPCM and the corresponding boundary is encoded using Run Length Chain coding. At the end of this paper, we present experimental results.
In coming years there will be an increasing demand for realistic 3-D display of scenes using such popular approaches as stereo or multi-view images. As the amount of information displayed increases so does the need for digital compression to ensure efficient storage and transmission of the sequences. In this paper, we introduce a new approach to stereo image compression based on the MRF model and MAP estimation. The basic strategy will be to encode the right image as a reference, then estimate the disparity between blocks in the right and left images and transmit the disparity and the error between the disparity compensated left image and the original. This approach has been used in the literature and is akin to the block matching technique used for motion compensation in video coders. The main drawback in this approach is that as the block size becomes smaller the overhead required to transmit the disparity map becomes too large. Also, simple block matching algorithms frequently fail to provide good matching results because the correspondences are locally ambiguous due to noise, occlusion, and repetition or lack of texture. The novelty in our work is that to compute the disparity map we introduce an MRF model with its corresponding energy equation. This allow us to incorporate smoothness constraints, to take into account occlusion, and to minimize the effect of noise in the disparity map estimation. Obtaining a smooth disparity is beneficial as it reduces the overhead required to transmit the disparity map. It is also useful for video coding since the robustness against noise ensures that disparity maps in successive frames will be very similar. We describe this new formulation in detail and provide compression results.