In this work, we propose a feasible 3D video generation method to enable high quality visual perception using a monocular uncalibrated camera. Anthropometric distances between face standard landmarks are approximated based on the person's age and gender. These measurements are used in a 2-stage approach to facilitate the construction of binocular stereo images. Specifically, one view of the background is registered in initial stage of video shooting. It is followed by an automatically guided displacement of the camera toward its secondary position. At the secondary position the real-time capturing is started and the foreground (viewed person) region is extracted for each frame. After an accurate parallax estimation the extracted foreground is placed in front of the background image that was captured at the initial position. So the constructed full view of the initial position combined with the view of the secondary (current) position, form the complete binocular pairs during real-time video shooting. The subjective evaluation results present a competent depth perception quality through the proposed system.
In this paper, we propose a hybrid 2D-to-3D video conversion system to recover the 3D structure of the scene. Depending on the scene characteristics, geometric or height depth information is adopted to form the initial depth map. This depth map is fused with color-based depth cues to construct the nal depth map of the scene background. The depths of the foreground objects are estimated after their classi cation into human and non-human regions. Speci cally, the depth of a non-human foreground object is directly calculated from the depth of the region behind it in the background. To acquire more accurate depth for the regions containing a human, the estimation of the distance between face landmarks is also taken into account. Finally, the computed depth information of the foreground regions is superimposed on the background depth map to generate the complete depth map of the scene which is the main goal in the process of converting 2D video to 3D.
We propose a progressive mesh geometry coder, which expresses geometry information in terms of spectral coefficients obtained through a transformation and codes these coefficients using a hierarchical set partitioning algorithm. The spectral transformation used is the one proposed in [10] where the spectral coefficients are obtained by projecting the mesh geometry onto an orthonormal basis determined by mesh topology. The set partitioning method that jointly codes the zeroes of these coefficients, treats the spectral coefficients for each of the three spatial coordinates with the right
priority at all bit planes and realizes a truly embedded bitstream by implicit bit allocation. The experiments on common irregular meshes reveal that the distortion-rate performance of our coder is significantly superior to that of the spectral coder of [10].
In reference 1, image adaptive linear minimum mean squared error (LMMSE) filtering was proposed as an enhancement layer color image coding technique that exploited the statistical dependencies among the luminance/chrominance or Karhunen Loeve Transform (KLT)coordinate planes of a lossy compressed color image to enhance the red, blue, green (RGB) color coordinate planes of that image. In the current work, we propose the independent design and application of LMMSE filters on the subbands of a color image as a low complexity solution. Towards this end, only the coordinates of the neighbors of the filtered subband coefficient, that are sufficiently correlated with the corresponding coordinate of the filtered subband coefficient, are included in the support of the filter for each subband. Additionally, each subband LMMSE filter is selectively applied only on the high variance regions of the subband. Simulation results show that, at the expense of an insignificant increase in the overhead rate for the transmission of the coefficients of the filters and with about the same enhancement gain advantage, subband LMMSE filtering offers a substantial complexity advantage over fullband LMMSE filtering.
The rate constrained block matching algorithm (RC-BMA), introduced in this paper jointly minimizes DFD variance and entropy or conditional entropy of motion vectors for determining the motion vectors in low rate video coding applications where the contribution of the motion vector rate to the overall coding rate might be significant. The motion vector rate versus DFD variance performance of RC-BMA employing size KxK blocks is shown to be superior to that of the conventional minimum distortion block matching algorithm (MD-BMA) employing size 2Kx2K blocks. Constraining of the entropy or conditional entropy of motion vectors in RC-BMA results in smoother and more organized motion vector fields with respect to those output by MD-BMA. The motion vector rate of RC-BMA can also be fine tuned to a desired level for each frame by adjusting a single parameter.
It is demonstrated in this paper that the encoding complexity advantage of a variable-length tree-structured vector quantizer (VLTSVQ) can be enhanced by encoding low dimensional subvectors of a source vector instead of the source vector itself at the nodes of the tree structure without significantly sacrificing coding performance. The greedy tree growing algorithm for the design of such a vector quantizer codebook is outlined. Different ways of partitioning the source vector into its subvectors and several criteria of interest for selecting the appropriate subvector for making the encoding decision at each node are discussed. Techniques of tree pruning and resolution reduction are applied to obtain improved coding performance at the same low encoding complexity. Application of an orthonormal transformation such as KLT or subband transformation to the source and the implication of defining the subvectors from orthogonal subspaces are also discussed. Finally simulation results on still images and AR(1) source are presented to confirm our propositions.
It has been proved recently that for Gaussian sources with memory an ideal subband split will produce a coding gain for scalar or vector quantization of the subbands. Following the methodology of the proofs, we outline a method for successively splitting the subbands of a source, one at a time to obtain the largest coding gain. The subband with the largest theoretical rate reduction (TRR) is determined and split at each step of the decomposition process. The TRR is the difference between the rate in optimal encoding of N-tuples from a Gaussian source (or subband) and the rate for the same encoding of its subband decomposition. The TRR is a monotone increasing function of a so-called spectral flatness ratio, which involves the products of the eigenvalues of the source (subband) and subband decomposition covariance matrices of order N. These eigenvalues are estimated by the variances of the Discrete Cosine Transform, which approximates those of the optimal Karhunen Loeve Transform. After the subband decomposition hierarchy or tree is determined through the criterion of maximal TRR, each subband is encoded with a variable rate entropy constrained vector quantizer. Optimal rate allocation to subbands is done with the BFOS algorithm which does not require any source modelling. We demonstrate the benefit of using the criterion by comparing coding results on a two-level low-pass pyramidal decomposition with coding results on a two-level decomposition obtained using the criterion. For 60 MCFD (Motion Compensated Frame Difference) frames of the Salesman sequence an average rate- distortion advantage of 0.73 dB and 0.02 bpp and for 30 FD (Frame Difference) frames of Caltrain image sequence an average rate-distortion advantage of 0.41 dB and 0.013 bpp are obtained with the optimal decomposition over low-pass pyramidal decomposition.
The paper presents two different approaches to image sequence coding which exploit the spatial frequency statistics as well as the spatial and temporal correlation present in the video signal. The first approach is the pyramidal decomposition of the Motion Compensated Frame Difference (MCFD) signal in the frequency domain and the subsequent coding by unbalanced Tree Structured Vector Quantizers (TSVQ) designed to match the statistics of the frequency bands. The type of TSVQ used in this study possess the advantage of low computational complexity with coding performance comparable to full-search vector quantization. The second approach is similar except that the order of motion estimation/compensation and pyramidal decomposition are interchanged.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.