In this work evidence is presented supporting the hypothesis that observers tend to evaluate very differently
the same properties of given skin-lesion images. Results from previous experiments have been compared to new
ones obtained where we gave additional prototypical visual cues to the users during their evaluation trials. Each
property (colour, colour uniformity, asymmetry, border regularity, roughness of texture) had to be evaluated
on a 0-10 range, with both linguistic descriptors and visual references at each end and in the middle (e.g.
light/medium/dark for colour). A set of 22 images covering different clinical diagnoses has been used in the
comparison with previous results. Statistical testing showed that only for a few test images the inclusion of the
visual anchors reduced the variability of the grading for some of the properties. Despite such reduction, though,
the average variance of each property still remains high even after the inclusion of the visual anchors. When
considering each property, the average variance significantly changed for the roughness of texture, where the
visual references caused an increase in the variability. With these results we can conclude that the variance of
the answers observed in the previous experiments was not due to the lack of a standard definition of the extrema
of the scale, but rather to a high variability in the way observers perceive and understand skin-lesion images.