During real time video communications over packet networks, various degradations can occur on spatial or temporal signal axes. The end-user may perceive loss of image clearness-sharpness and fluidity impairments on visual information. The overall perceived degradation may indeed be seen as a combined contribution of both perceptual axes. A significant perceptual interaction between spatial and temporal quality has been highlighted by a set of subjective quality assessment tests. We show that, at least in our experimental conditions, the overall visual quality can be estimated from independent spatial (clearness-sharpness) and temporal (fluidity) quality assessments. Four visual quality prediction models are presented. The models' predictions show significant correlation with mean opinion scores from a group of observers. The model showing the highest performance takes into account non linear human assessment characteristics. Our results lead to a better understanding of spatiotemporal interactions and they could be useful for the conception of automatic video quality metrics.
Over the past few years there has been an increasing interest in real time video services over packet networks. When
considering quality, it is essential to quantify user perception of the received sequence. Severe motion discontinuities are
one of the most common degradations in video streaming. The end-user perceives a jerky motion when the
discontinuities are uniformly distributed over time and an instantaneous fluidity break is perceived when the motion loss
is isolated or irregularly distributed. Bit rate adaptation techniques, transmission errors in the packet networks or
restitution strategy could be the origin of this perceived jerkiness. In this paper we present a psychovisual experiment
performed to quantify the effect of sporadically dropped pictures on the overall perceived quality. First, the perceptual
detection thresholds of generated temporal discontinuities were measured. Then, the quality function was estimated in
relation to a single frame dropping for different durations. Finally, a set of tests was performed to quantify the effect of
several impairments distributed over time. We have found that the detection thresholds are content, duration and motion
dependent. The assessment results show how quality is impaired by a single burst of dropped frames in a 10 sec
sequence. The effect of several bursts of discarded frames, irregularly distributed over the time is also discussed.
In video sequences, scene cuts produce a temporal masking effect on several kinds of artifacts. This temporal sensitivity
reduction of the human visual system could be present before (backward masking) and after (forward masking) scene
cuts. Related studies reported a significant forward masking in the first 30 to 100 ms following a scene change
depending on the impairment nature and the picture content. Backward masking at scene cuts seems to be less
significant. In this paper we present the results of a psychovisual experiment performed to characterize the temporal
masking effect on discontinuities caused by dropped frames in the vicinity of scene cuts. The forward and backward
masking was estimated in relation to a single burst of discarded frames of different durations. The four alternatives
forced choice psychophysical method was employed to evaluate the detection thresholds. The test was carried out using
natural video contents. Our results from the forward masking test are consistent with those reported in the state of the art
even if the test conditions were quite different. However, the back masking effect on frame dropping perception is more
significant than with forward masking.