Improved structural similarity metric for the visible quality measurement of images

Abstract. The visible quality assessment of images is important to evaluate the performance of image processing methods such as image correction, compressing, and enhancement. The structural similarity is widely used to determine the visible quality; however, existing structural similarity metrics cannot correctly assess the perceived human visibility of images that have been slightly geometrically transformed or images that have undergone significant regional distortion. We propose an improved structural similarity metric that is more close to human visible evaluation. Compared with the existing metrics, the proposed method can more correctly evaluate the similarity between an original image and various distorted images.


Introduction
It is crucial to assess objectively image qualities for image processing applications because the assessments can compare with results of other methods to evaluate the performance.For measuring the performance of image correction, compressing and enhancing methods, such as denoising, JPEG compression, super-resolution, and frame rate upconversion, [1][2][3][4][5][6][7] and almost all objective evaluation metrics do not completely agree with the perceived subjective visibility of humans, while subjective evaluation is usually too inconvenient, time-consuming, and expensive. 8he simplest and most widely used metrics are mean squared error (MSE) and peak signal-to-noise ratio (PSNR); MSE is computed by averaging the squared differences of two signals, and PSNR is the ratio between the maximum value (Max) of a signal and the MSE as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 3 0 7 MSE ¼ E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 2 5 8 PSNR ¼ 10 log 10 Max 2 MSE ; 0][11][12][13] A lot of image quality assessment methods based on error sensitivity have been proposed, [14][15][16][17][18][19] and they use the human visual system (HVS), contrast sensitivity function, discrete cosine transform, wavelet transform, and so forth.However, the similarity errors assessed by them may quite differ with the loss of qualities, so some distortions may be clearly visible but these errors are not clearly observed in them. 8ecently, structural similarity (SSIM) has typically been used to determine visible quality. 8,20This is a full reference image quality assessment method and it indicates how much an image is similar to the original image.It has three main components, which are structure, illuminance, and contrast.However, the components, especially structure component, are highly sensitive to translation, scaling, and rotation of an image.This means that although when images are translated and rotated as little as an unrecognizable amount, the SSIM is sensitively decreased. 21Moreover, it may overestimate images that have undergone regional distortions such as JPEG compression.
In this paper, we aim at developing an improved structural similarity metric to outperform the typical SSIM, which can be used to overcome potential drawbacks.The proposed metric uses an improved structure comparison, and additionally uses a sharpness comparison.

SSIM and Its Drawbacks
Since humans usually use contrast, color, and frequency changes in their image quality measures, 22 the SSIM uses the luminance, contrast, and structure comparison shown in Fig. 1. 8,22 The SSIM of two images x and y is defined by the combination fðÞ of three components as follows: 8 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 1 9 9   SSIMðx; yÞ ¼ f½lðx; yÞ; cðx; yÞ; sðx; yÞ; (3) where l, c, and s are the luminance, contrast, and structure comparison functions, respectively, defined by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 1 4 6 lðx; E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 1 0 2 cðx; ; t e m p : i n t r a l i n k -; e 0 0 6 ; 6 3 ; 5 5 5 sðx; yÞ ¼ where μ x and σ x denote the mean and the standard deviation of x; μ y and σ y denote the mean and the standard deviation of y; σ xy denotes the covariance between x and y; and The local statistics are calculated within the local window having circular symmetric Gaussian weights, which are w ¼ fw i ji ¼ 1;2; : : : ; Ng and P N i¼1 w i ¼ 1 as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 3 2 6 ; 5 2 2 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 6 3 ; 7 5 2 σ x ¼ E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 6 3 ; 7 1 3 σ xy ¼ where i is an index of the pixels in the Gaussian window and N is the total pixel number of the Gaussian window.
The combination of all comparisons between two images x and y is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 6 3 ; 6 2 4 SSIMðx; yÞ ¼ ½lðx; yÞ α • ½cðx; yÞ β • ½sðx; yÞ γ ; (10)   where α > 0, β > 0, and γ > 0 are parameters used to adjust the relative importance.In order to simplify the expression and equalize the relative importance of the three components, they are generally set , so we also set the parameters in the same manner. 8,21The results in a specific form of the SSIM index as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 3 2 6 ; 7 3 0 SSIMðx; To measure a single overall quality measure of the entire image, a mean SSIM (MSSIM) index is used as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 3 2 6 ; 6 6 2 MSSIMðX; where X and Y are the original and the distorted images, respectively, and M is the number of pixels of images as used in Eq. ( 1). 8 MSSIM can be interpreted as a mean value of the SSIM index map. 23Because SSIM values have the range of [0, 1], MSSIM also has the same range.The SSIM and MSSIM can be used to measure the similarity of two images.However, they have some drawbacks as shown in Fig. 2 and Table 2. First, images filtered by a low pass filter, such as a mean filter (MF), a median filter (MedF), and JPEG compression, are evaluated as having high similarity scores.Second, images that have been slightly distorted by some geometric transformations, such as spatial translation (ST) and rotation (RT), are evaluated as having low similarity scores.

New Structural Similarity
The main component of the SSIM that causes drawbacks is the structure comparison defined by Eq. ( 6).When we use Eq. ( 3) by only combining Eqs. ( 4) and ( 5), images that are slightly geometrically transformed do not have low similarities as shown in Fig. 3 and Table 1, where lðx; yÞ, cðx; yÞ, and sðx; yÞ are the mean of lðx; yÞ in Eq. ( 4), cðx; yÞ in Eq. ( 5), and sðx; yÞ in Eq. ( 6).In Table 1, sðx; yÞ of the ST image is very low, while sðx; yÞ of the JPEG image is higher than that of the ST image.This example shows that the limitation of SSIM is sensitive to ST, scaling, and RT.
To reduce the weak effect of sðx; yÞ, we define the structure comparison in a new way as follows: where σ x− and σ xþ denote the standard deviations for elements of x smaller than and larger than μ x , respectively, and σ y− and σ yþ denote the same for y.In Ref. 8, structural information in an image is defined as those attributes that represent the structure of objects in the scene, independent of the average luminance and contrast, and structure comparison is conducted after luminance subtraction and variance normalization.So sðx; yÞ is defined by the correlation between standard scores (z-score), 24 ðx − μ x Þ∕σ x and ðy − μ y Þ∕σ y .However, we define sðx; yÞ as the correlation between standard deviations for pixels having positive/negative standard scores because σ x− and σ xþ can represent the structure of objects by dividing as locally brighter and darker regions.As shown in Fig. 3 and Table 1, the weak effect of sðx; yÞ is relatively decreased compared to the original SSIM; however, the similarity of the ST image is lower than that of the JPEG image.That is to say, the SSIM still overestimates blurred images, when s is used as the structure comparison.Therefore, we add a new component, the sharpness comparison hðx; yÞ, which is the correlation between the normalized digital Laplacian, defined as where ∇ 2 x and ∇ 2 y denote the normalized digital Laplacian given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 3 2 6 ; 2 8 1 The new similarity components sðx; yÞ and hðx; yÞ are satisfied with the properties for measurement metrics as follows: 1. Symmetry: Sðx; yÞ ¼ Sðy; xÞ; 2. Boundedness: Sðx; yÞ ≤ 1; 3. Unique maximum: Sðx; yÞ ¼ 1, if and only if x ¼ y.
As shown in Fig. 4, the mean of hðx; yÞ of the ST image is higher than that of the JPEG image.Finally, the improved SSIM which includes the sharpness comparison (ISSIM-S) can be defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 6 ; 3 2 6 ; 1 1 5 ISSIM-S ¼ lðx; yÞ • cðx; yÞ • sðx; yÞ • hðx; yÞ; (16)   and the proposed ISSIM-S measurement system can be configured (Fig. 4).  in Fig. 5.The pixel values of the index map are normalized SSIM or ISSIM-S values.The index maps have different results, and the index maps of the ISSIM-S are darker than those of the SSIM because the MISSIM-Ss are lower than the MSSIMs.While the index maps of the ISSIM-S for IN, ST, and RT are brighter than those of the SSIM, because the similarities of the ISSIM-S are increased than those of the SSIM as shown in Fig. 6.The index maps of MLS are very similar as shown in Fig. 7.
To compare the mean opinion scores (MOSs), the rank of PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS are shown in Table 2. To measure MOSs, we showed subjects the result images of each processing with the original image, and received their opinion scores, which have ranges of 1 (not similar) to 5 (very similar).Each comparison was implemented one-on-one with the original image and we randomized the order of the distorted images we showed to minimize order effects.The number of test subjects was 17 and none of them had any problems with their eyes.The experiments were implemented under the regulated illumination conditions and display conditions.
The scores themselves are subjective and not convincing but they can have meaning in relative comparison.Therefore, we used MOS ranks instead of MOS itself.The rank correlations by the MOS rank are also shown, where the rank correlation is computed by Spearman's rank correlation coefficient (ρ) 25 which is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 6 3 ; 4 5 5 ρ ¼ 1 where d i denotes the difference of the i'th rank and n denotes the ranking size.The rank correlation of the mean of the ISSIM-S is closer to 1 than the others.We compared PSNR, SSIM, ISSIM-S, and MOS with another image shown in Fig. 8 and the results are shown in Table 3.The types of distortion are exactly the same as those of Table 2, but the only difference is the filter size.The resolution of test images in Table 2 is 256 × 256 and the filter size is 11 × 11; however, the resolution of test images in Fig. 8 is 128 × 128 so we set the filter size as 5 × 5.
To evaluate the performance with different distortion levels, we tested a few more images: blurred images with different sizes of MF, images that have undergone various loss via JPEG compression, and images differently translated by ST (shown in Fig. 9 and Table 4).As the distortion level increases, PSNR, MSSIM, and mean ISSIM-S decrease, no matter the processing type.However, in ST, PSNR and MSSIM have the lowest values when it is translated only 3 pixels according to y axis, while mean ISSIM-S does not.ISSIM-S is also affected by translation but it is less sensitive than PSNR and SSIM methods.
We conducted two additional experiments.First, comparison of ST, MF, and JPEG compression for various scene contents are shown Fig. 10 and Table 5.The resolutions of the tested images in this experiment are 256 × 256.The PSNR and the mean of SSIM values for each image are scored according to this order, ST < MF < JPEG.However, the mean of ISSIM-S shows another pattern, which is MF < JPEG < ST.The order of ISSIM-S is more reasonable than PSNR or SSIM.This result shows that the proposed image quality assessment method does not overestimate blurred images and it is much less sensitive to geometric transformations, which were one of the identified drawbacks of SSIM.Second, as shown in Fig. 11 and Table 6, we compared the PSNR, the mean of SSIM, and the mean of ISSIM-S for various combinations of degradations.The drawback of SSIM is that it is too sensitive to geometric translation and can be found when the degradations are combined.This result shows that MSSIM overvalues HE+IN while MISSIM-S evaluates moderately.It means that MISSIM-S is much closer to HVS because MISSIM-S is less sensitive to a small amount of geometric translation just as HVS is.
In addition, we tested the variations of MSSIM and MISSIM-S in terms of the size of the Gaussian window as shown in Fig. 12, where the 11 × 11 window size is large  enough because the variations are very small when the window size is larger than 11.

Conclusion
In this paper, we have proposed an improved structural similarity metric using structure and sharpness comparison functions to overcome the drawbacks of the SSIM metric.The structure comparison used segmented standard deviations by the mean, and sharpness comparison used the normalized digital Laplacian.The proposed metric can evaluate geometric transformed images with high similarities and cannot overestimate blurred images such as JPEG compression.
The experimental results indicate that our similarity metric is superior to existing methods in respect to the perceived visibility of humans.Therefore, our method can be used to evaluate the performance of various methods such as image enhancement, frame rate upconversion, image compression, super-resolution, and image restoration.

C 1 ,
C 2 , and C 3 are constants used to avoid instability when the denominators are very close to zero.The values of l, c, and s are in [0, 1] and they indicate higher similarities for each comparison function when the values are close to 1.

Fig. 3
Fig. 3 Comparison of the original, ST, and JPEG compression image.

Fig. 5
Fig. 5 Comparison of image similarity (from left to right: the evaluating images of Fig. 2, index maps of the SSIM, and index maps of the ISSIM-S).

Fig. 6
Fig. 6 Comparison of image similarity (from left to right: the evaluating images, index maps of the SSIM, and index maps of the ISSIM-S).

Fig. 7
Fig. 7 Comparison of image similarity (from left to right: the evaluating images, index maps of the SSIM, and index maps of the ISSIM-S).

Fig. 8
Fig. 8 Comparison of "Einstein" image similarity (from left to right: the evaluating images, index maps of the SSIM, and index maps of the ISSIM-S).

Fig. 9
Fig. 9 Comparison of image similarity for different distortion levels (the numerics in parentheses indicate filter sizes of MF, quality factors of JPEG compression, and pixel amounts of ST).

Fig. 12
Fig. 12 Variations of MSSIM and MISSIM-S in terms of the size of the Gaussian window.

Table 1
Comparison of MSSIM and its components with MSSIM-S and its components about Fig.3.Fig. 4 Diagram of the proposed ISSIM-S measurement system.Journal of Electronic Imaging 063015-3 Nov∕Dec 2016 • Vol.25(6)

Table 2
Comparison of the PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS rank of "Lena" image (the rank for each metric is shown in parentheses).

Table 3
Comparison of the PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS rank of "Einstein" image (the rank for each metric is shown in parentheses).
Journal of Electronic Imaging 063015-6 Nov∕Dec 2016 • Vol.25(6) Lee and Lim: Improved structural similarity metric for the visible quality measurement of images Downloaded From: https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging on 12/6/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 3 2 6 ; 3 3 9

Table 4
Comparison of the PSNR, mean of the SSIM, and mean of the ISSIM-S for different distortion levels.

Table 6
Comparison of the PSNR, mean of the SSIM, and mean of the ISSIM-S for various combinations of degradations.