CONTEXT: Nowadays, almost all stereoscopic displays suffer from crosstalk, which is one of the most dominant degradation factors of image quality and visual comfort for 3D display devices. To deal with such problems, it is worthy to quantify the amount of perceived crosstalk OBJECTIVE: Crosstalk measurements are usually based on some certain test patterns, but scene content effects are ignored. To evaluate the perceived crosstalk level for various scenes, subjective test may bring a more correct evaluation. However, it is a time consuming approach and is unsuitable for real time applications. Therefore, an objective metric that can reliably predict the perceived crosstalk is needed. A correct objective assessment of crosstalk for different scene contents would be beneficial to the development of crosstalk minimization and cancellation algorithms which could be used to bring a good quality of experience to viewers. METHOD: A patterned retarder 3D display is used to present 3D images in our experiment. By considering the mechanism of this kind of devices, an appropriate simulation of crosstalk is realized by image processing techniques to assign different values of crosstalk to each other between image pairs. It can be seen from the literature that the structures of scenes have a significant impact on the perceived crosstalk, so we first extract the differences of the structural information between original and distorted image pairs through Structural SIMilarity (SSIM) algorithm, which could directly evaluate the structural changes between two complex-structured signals. Then the structural changes of left view and right view are computed respectively and combined to an overall distortion map. Under 3D viewing condition, because of the added value of depth, the crosstalk of pop-out objects may be more perceptible. To model this effect, the depth map of a stereo pair is generated and the depth information is filtered by the distortion map. Moreover, human attention is one of important factors for crosstalk assessment due to the fact that when viewing 3D contents, perceptual salient regions are highly likely to be a major contributor to determining the quality of experience of 3D contents. To take this into account, perceptual significant regions are extracted, and a spatial pooling technique is used to combine structural distortion map, depth map and visual salience map together to predict the perceived crosstalk more precisely. To verify the performance of the proposed crosstalk assessment metric, subjective experiments are conducted
with 24 participants viewing and rating 60 simuli (5 scenes * 4 crosstalk levels * 3 camera distances). After an outliers
removal and statistical process, the correlation with subjective test is examined using Pearson and Spearman rank-order correlation coefficient. Furthermore, the proposed method is also compared with two traditional 2D metrics, PSNR and SSIM. The objective score is mapped to subjective scale using a nonlinear fitting function to directly evaluate the performance of the metric. RESULIS: After the above-mentioned processes, the evaluation results demonstrate that the proposed metric is highly correlated with the subjective score when compared with the existing approaches. Because the Pearson coefficient of the proposed metric is 90.3%, it is promising for objective evaluation of the perceived crosstalk. NOVELTY: The main goal of our paper is to introduce an objective metric for stereo crosstalk assessment. The novelty contributions are twofold. First, an appropriate simulation of crosstalk by considering the characteristics of patterned retarder 3D display is developed. Second, an objective crosstalk metric based on visual attention model is introduced.