The 3D image/video quality of experience is a multidimensional concept that depends on 2D image quality, depth quantity and visual comfort. The relationship between these parameters is not yet clearly defined. From this perspective, we aim to understand how texture complexity, depth quantity and visual comfort influence the way people observe 3D content in comparison with 2D. Six scenes with different structural parameters were generated using Blender software. For these six scenes, the following parameters were modified: texture complexity and the amount of depth changing the camera baseline and the convergence distance at the shooting side. Our study was conducted using an eye-tracker and a 3DTV display. During the eye-tracking experiment, each observer freely examined images with different depth levels and texture complexities. To avoid memory bias, we ensured that each observer had only seen scene content once. Collected fixation data were used to build saliency maps and to analyze differences between 2D and 3D conditions. Our results show that the introduction of disparity shortened saccade length; however fixation durations remained unaffected. An analysis of the saliency maps did not reveal any differences between 2D and 3D conditions for the viewing duration of 20 s. When the whole period was divided into smaller intervals, we found that for the first 4 s the introduced disparity was conducive to the section of saliency regions. However, this contribution is quite minimal if the correlation between saliency maps is analyzed. Nevertheless, we did not find that discomfort (comfort) had any influence on visual attention. We believe that existing metrics and methods are depth insensitive and do not reveal such differences. Based on the analysis of heat maps and paired t-tests of inter-observer visual congruency values we deduced that the selected areas of interest depend on texture complexities.