The use of 3D video is growing in several fields such as entertainment, military simulations, medical applications. However, the process of recording, transmitting, and processing 3D video is prone to errors thus producing artifacts that may affect the perceived quality. Nowadays a challenging task is the definition of a new metric able to predict the perceived quality with low computational complexity in order to be used in real-time applications. The research in this field is very active due to the complexity of the analysis of the influence of stereoscopic cues. In this paper we present a novel stereoscopic metric based on the combination of relevant features able to predict the subjective quality rating in a more accurate way.
We present a methodology to assess objective visual quality of multi-view and light-field displays. We consider the
display as a signal processing channel and study its ability to deliver a signal while introducing negligible distortions.
We start by creating a model of a display, which represents its output as a set of rays in a specific (x, y, o) coordinate
space. We created a simulation framework that can use the model and render the expected output of the display for a
given observation position. The framework employs an image analysis block, which aims to predict the perceptual effect
of the introduced distortions and judge if the original signal is still predominant in the output. Using the framework, we
can try a large set of test signals against the display model and find the ones, which are represented with sufficiently low
distortion levels. We use test signals, which contain gradually changing frequency components, and use the results of the
tests to build the so-called 3D passband of the display. The 3D passband can be used as a quantitative measure of the
display’s ability to faithfully represent image details. The size of the passband is indicative of the spatial and angular
resolution of the display.
We created two display models to serve as an example case for out framework. One model represents a typical multiview
display, and the other is representing a typical projection-based light-field display. We estimate the passband for
each display model and present the results. The resulting passbands suggest, that for a given “ray-budget”, the ray
distribution typical for light-field displays results on a wider and more uniform passband than in the case with multiview
3D video content is captured and created mainly in high resolution targeting big cinema or home TV screens. For 3D
mobile devices, equipped with small-size auto-stereoscopic displays, such content has to be properly repurposed,
preferably in real-time. The repurposing requires not only spatial resizing but also properly maintaining the output stereo
disparity, as it should deliver realistic, pleasant and harmless 3D perception.
In this paper, we propose an approach to adapt the disparity range of the source video to the comfort disparity zone of
the target display. To achieve this, we adapt the scale and the aspect ratio of the source video. We aim at maximizing the
disparity range of the retargeted content within the comfort zone, and minimizing the letterboxing of the cropped
The proposed algorithm consists of five stages. First, we analyse the display profile, which characterises what 3D
content can be comfortably observed in the target display. Then, we perform fast disparity analysis of the input
stereoscopic content. Instead of returning the dense disparity map, it returns an estimate of the disparity statistics (min,
max, meanand variance) per frame. Additionally, we detect scene cuts, where sharp transitions in disparities occur.
Based on the estimated input, and desired output disparity ranges, we derive the optimal cropping parameters and scale
of the cropping window, which would yield the targeted disparity range and minimize the area of cropped and
letterboxed content. Once the rescaling and cropping parameters are known, we perform resampling procedure using
spline-based and perceptually optimized resampling (anti-aliasing) kernels, which have also a very efficient
computational structure. Perceptual optimization is achieved through adjusting the cut-off frequency of the anti-aliasing
filter with the throughput of the target display.
We present an approach to measure and model the parameters of human point-of-gaze (PoG) in 3D space. Our model
considers the following three parameters: position of the gaze in 3D space, volume encompassed by the gaze and time
for the gaze to arrive on the desired target.
Extracting the 3D gaze position from binocular gaze data is hindered by three problems. The first problem is the lack of
convergence - due to micro saccadic movements the optical lines of both eyes rarely intersect at a point in space. The
second problem is resolution - the combination of short observation distance and limited comfort disparity zone typical
for a mobile 3D display does not allow the depth of the gaze position to be reliably extracted. The third problem is
measurement noise - due to the limited display size, the noise range is close to the range of properly measured data.
We have developed a methodology which allows us to suppress most of the measurement noise. This allows us to
estimate the typical time which is needed for the point-of-gaze to travel in x, y or z direction. We identify three temporal
properties of the binocular PoG. The first is reaction time, which is the minimum time that the vision reacts to a stimulus
position change, and is measured as the time between the event and the time the PoG leaves the proximity of the old
stimulus position. The second is the travel time of the PoG between the old and new stimulus position. The third is the
time-to-arrive, which is the time combining the reaction time, travel time, and the time required for the PoG to settle in
the new position.
We present the method for filtering the PoG outliers, for deriving the PoG center from binocular eye-tracking data and
for calculating the gaze volume as a function of the distance between PoG and the observer. As an outcome from our
experiments we present binocular heat maps aggregated over all observers who participated in a viewing test. We also
show the mean values for all temporal properties separately for x, y and z direction averaged over all observers. We
show the typical size of a binocular area of interest for a portable autostereoscopic display, as well as typical time the 3D
vision can react to sudden changes in a 3D scene.
We perform comparative analysis of the visual quality of multiple 3D displays - seven portable ones, and a large 3D
television set. We discuss two groups of parameters that influence the perceived quality of mobile 3D displays. The first
group is related with the optical parameters of the displays, such as crosstalk or size of sweet spots. The second group
includes content related parameters, such as objective and subjective comfort disparity range, suitable for a given
display. We identify eight important parameters to be measured, and for each parameter we present the measurement
methodology, and give comparative results for each display. Finally, we discuss the possibility of each display to
visualize downscaled stereoscopic HD content with sufficient visual quality.
The paper presents a hybrid approach to study the user's experienced quality of 3D visual content on mobile autostereoscopic
displays. It combines extensive subjective tests with collection and objective analysis of eye-tracked data.
3D cues which are significant for mobiles are simulated in the generated 3D test content. The methodology for
conducting subjective quality evaluation includes hybrid data-collection of quantitative quality preferences, qualitative
impressions, and binocular eye-tracking. We present early results of the subjective tests along with eye movement
reaction times, areas of interest and heatmaps obtained from raw eye-tracked data after statistical analysis. The study
contributes to the question what is important to be visualized on portable auto-stereoscopic displays and how to maintain
and visually enhance the quality of 3D content for such displays.
Multiview displays suffer from two common artifacts - Moiré, caused by aliasing, and ghosting artifacts caused by
crosstalk. By measuring the angular brightness function of each TFT element we create so-called brightness mask, which
allows us to simulate the display output for a given input image. We consider multiview display as image processing
channel and model the artifacts as distortions of the input signal. We test the channel by using a set of signals with
various frequency components as input, and analyzing the output in the frequency domain. We derive the so-called
bandpass region of the display, where the distortions introduced to the input signals are under certain threshold. Then,
we extend the simulations including input signals with varying disparity, and obtain multiple passbands - one for each
disparity level. We approximate each passband with a rectangle and store the height and width of that rectangle in a
We propose an artifact mitigation framework which can be used for realtime processing of textures with known apparent
depth. The framework gives the user ability to set so-called "3D-shapness" - a parameter, which controls the trade-off
between visibility of details and presence of artifacts. The "3D-sharpness parameter determines what level of distortions
is allowed in the final image, regardless of its disparity. The framework uses the approximated width and height of the
passband areas in order to design an optimal (for the needed disparity and desired distortion level) anti-aliasing filter. We
discuss a methodology for filter design, and show example results, based on measurements of an 8-view display.
Mobile 3D television is a new form of media experience, which combines the freedom of mobility with the greater
realism of presenting visual scenes in 3D. Achieving this combination is a challenging task as greater viewing experience
has to be achieved with the limited resources of the mobile delivery channel such as limited bandwidth and power
constrained handheld player. This challenge sets need for tight optimization of the overall mobile 3DTV system.
Presence of depth and compression artifacts in the played 3D video are two major factors that influence viewer's
subjective quality of experience and satisfaction. The primary goal of this study has been to examine the influence of
varying depth and compression artifacts on the subjective quality of experience for mobile 3D video content. In addition,
the influence of the studied variables on simulator sickness symptoms has been studied and vocabulary-based descriptive
quality of experience has been conducted for a sub-set of variables in order to understand the perceptual characteristics in
detail. In the experiment, 30 participants have evaluated the overall quality of different 3D video contents with varying
depth ranges and compressed with varying quantization parameters. The test video content has been presented on a
portable autostereoscopic LCD display with horizontal double density pixel arrangement. The results of the
psychometric study indicate that compression artifacts are a dominant factor determining the quality of experience
compared to varying depth range. More specifically, contents with strong compression has been rejected by the viewers
and deemed unacceptable. The results of descriptive study confirm the dominance of visible spatial artifacts along the
added value of depth for artifact-free content. The level of visual discomfort has been determined as not offending.
We investigate the effect of camera de-calibration on the quality of depth estimation. Dense depth map is a format
particularly suitable for mobile 3D capture (scalable and screen independent). However, in real-world scenario cameras
might move (vibrations, temp. bend) form their designated positions. For experiments, we create a test framework,
described in the paper. We investigate how mechanical changes will affect different (4) stereo-matching algorithms. We
also assess how different geometric corrections (none, motion compensation-like, full rectification) will affect the
estimation quality (how much offset can be still compensated with "crop" over a larger CCD). Finally, we show how
estimated camera pose change (E) relates with stereo-matching, which can be used for "rectification quality" measure.
We identify, categorize and simulate artifacts which might occur during delivery stereoscopic video to mobile devices.
We consider the stages of 3D video delivery dataflow: content creation, conversion to the desired format (multiview or
source-plus-depth), coding/decoding, transmission, and visualization on 3D display. Human 3D vision works by
assessing various depth cues - accommodation, binocular depth cues, pictorial cues and motion parallax. As a
consequence any artifact which modifies these cues impairs the quality of a 3D scene.
The perceptibility of each artifact can be estimated through subjective tests. The material for such tests needs to contain
various artifacts with different amounts of impairment. We present a system for simulation of these artifacts. The
artifacts are organized in groups with similar origins, and each group is simulated by a block in a simulation channel.
The channel introduces the following groups of artifacts: sensor limitations, geometric distortions caused by camera
optics, spatial and temporal misalignments between video channels, spatial and temporal artifacts caused by coding,
transmission losses, and visualization artifacts. For the case of source-plus-depth representation, artifacts caused by
format conversion are added as well.
In this contribution, we present two GPU-optimized algorithms for displaying the frames of 2D-plus-Z stream on a
multiview 3D display. We aim at mitigating the cross-talk artifacts, which are inherent for such displays. In our
approach, a 3D mesh is generated using the given depth map, then textured by the given 2D scene and properly
interdigitized on the screen. We make use of the GPU built-in libraries to perform these operations in a fast manner. To
reduce the global crosstalk presence, we investigate two approaches. In the first approach, the 2D image is appropriately
smoothed before texturing. The smoothing is done in horizontal direction by a 1-D filter bank driven by the given depth
map. Such smoothing provides the needed anti-aliasing at the same filtering step. In the second approach, we introduce a
higher number of properly blended virtual views than the display views supported and demonstrate that this is equivalent
to a smoothing operation. We provide experimental results and discuss the performance and computational complexity of
the two approaches. While the first approach is more appropriate for higher-resolution displays equipped with newer
graphical accelerators, the latter approach is more general and suitable for lower-resolution displays and wider range of
We propose an image registration technique to be implemented on mobile devices equipped with cameras. We address
the limited computational power and low-quality optics of such devices and aim at designing a registration algorithm,
which is fast, robust with respect to noise, and allows for corrections of optical distortions. We favor a feature-based
approach, consisting of feature extraction, feature filtering, feature matching, and transformation estimation. In our
application, the transformation estimation is robust to local distortions, and is accurate enough to allow for a subsequent
super-resolution on the registered images. The performance of the technique is demonstrated in fixed-point
implementation on the TMS 320 C5510 DSP.
KEYWORDS: Mobile devices, Digital signal processing, Digital filtering, Wavelets, Linear filtering, Signal processing, Electronic filtering, Motion estimation, Wigner distribution functions, Filtering (signal processing)
Motion estimation (ME) is the most time consuming part in contemporary video compression algorithms and standards. In recent years, certain transform domain "phase-correlation" ME algorithms based on Complex-valued Wavelet Transforms have been developed to achieve lower complexity than the previous approaches.
In the present paper we describe an implementation of the basic phase-correlation ME techniques on a fixed-point dual-core processor architecture such as the TI OMAP one. We aim at achieving low computational complexity and algorithm stability without affecting the estimation accuracy.
The first stage of our ME algorithm is a multiscale complex-valued transform based on all-pass filters. We have developed wave digital filter (WDF) structures to ensure better performance and higher robustness in fixed-point arithmetic environments. For higher efficiency the structures utilize some of the dedicated filtering instructions present in the 'C5510 DSP part of the dual-core processor.
The calculation of motion vectors is performed using maximum phase-correlation criteria. Minimum subband squared difference is estimated for every subband level of the decomposition. To minimize the number of real-time computations we have adapted this algorithm to the functionality of the hardware extensions present in the 'C5510.
We consider our approach quite promising for realizing video coding standards on mobile devices, as many of them utilize fixed-point DSP architectures.